profile image of Alexander Swerdlow

Hi there

I'm Alexander Swerdlow

Welcome to my site!

I'm a Master's of Robotics student at CMU, advised by Katerina Fragkiadaki. Previously, I did my undergrad in Computer Science at UCLA1, where I was fortunate to be advised by Bolei Zhou.

This site mainly exists to hold my CV, contact info, and my recent projects. If you want to take a look at some of my code, you can also check out my GitHub profile. Outside of research, I love to spend time outdoors.

Contact Me
clean-usnobUnified Multimodal Discrete Diffusion
Alexander Swerdlow*, Mihir Prabhudesai*, Siddharth Gandhi, Deepak Pathak, Katerina Fragkiadaki
Preprint, Under review, 2024
project page / preprint
clean-usnobUnifying 2D and 3D Vision-Language Understanding
Ayush Jain*, Alexander Swerdlow*, Yuzhou Wang, Alexander Sax, Franziska Meier, Katerina Fragkiadaki
Preprint, Under review, 2024
preprint
clean-usnobStreet-View Image Generation from a Bird's-Eye View Layout
Alexander Swerdlow, Runsheng Xu, Bolei Zhou
IEEE Robotics and Automation Letters (RA-L) & IEEE International Conference on Intelligent Robots and Systems (IROS), 2024
project page / arxiv / code / slides / supplementary
clean-usnobSCALER: A Tough Versatile Quadruped Free-Climber Robot
Yusuke Tanaka, Yuki Shirai, Xuan Lin, Alexander Schperberg, Hayato Kato, Alexander Swerdlow, Naoya Kumagai, Dennis Hong.
IEEE International Conference on Intelligent Robots and Systems (IROS), 2022
arxiv / blog post / vision code

Below are some past experiments and class projects.

clean-usnobDynamic Dust3r
Large-scale data generation of dynamic scenes in Blender, based off a combination of PointOdyssey and Kubrics. This provides pixel-perfect GT 3D point tracking which is used to fine-tuning dust3r to predict the static pointmap in one camera frame and a dynamic pointmap delta in the second camera frame, enabling tracking and reconstruction of dynamic scenes.
model code / datagen code
clean-usnobInterpertable image editing with latent objects
We develop an image editing pipeline trained an unlabeled image collection. We train a diffusion model to decode a set of region features obtained from a off-the-shelf encoder & segmentation model. By heavily augmenting the denoising target and using contrastive losses, we learn a latent bottleneck that allows for positional and semantic control for individual objects in a given image.
code and visualizations

Design taken from Jon Barron's website.   |   To see some 🔮, load this page while disconnected from the internet.