profile image of Alexander Swerdlow

Hi there

I'm Alexander Swerdlow

Welcome to my site!

I'm an incoming PhD student at Stanford, starting in Fall 2025! Before then, I'll be interning at Physical Intelligence for the summer.

Currently, I am a Master's of Robotics student at CMU, advised by Katerina Fragkiadaki and working with Deepak Pathak. Previously, I did my undergrad in Computer Science at UCLA1, where I was fortunate to be advised by Bolei Zhou.
This site mainly exists to hold my CV, contact info, and my recent projects. If you want to take a look at some of my code, please see my GitHub profile. Outside of research, I love to spend time outdoors.

Contact Me
clean-usnobUnified Multimodal Discrete Diffusion
Alexander Swerdlow*, Mihir Prabhudesai*, Siddharth Gandhi, Deepak Pathak, Katerina Fragkiadaki
Preprint, Under review, 2025
project page / preprint
clean-usnobUnifying 2D and 3D Vision-Language Understanding
Ayush Jain*, Alexander Swerdlow*, Yuzhou Wang, Alexander Sax, Franziska Meier, Katerina Fragkiadaki
Preprint, Under review, 2025
project page / arXiv / code
clean-usnobStreet-View Image Generation from a Bird's-Eye View Layout
Alexander Swerdlow, Runsheng Xu, Bolei Zhou
IEEE Robotics and Automation Letters (RA-L) & IEEE International Conference on Intelligent Robots and Systems (IROS), 2024
project page / arXiv / code / slides / supplementary
clean-usnobSCALER: A Tough Versatile Quadruped Free-Climber Robot
Yusuke Tanaka, Yuki Shirai, Xuan Lin, Alexander Schperberg, Hayato Kato, Alexander Swerdlow, Naoya Kumagai, Dennis Hong.
IEEE International Conference on Intelligent Robots and Systems (IROS), 2022
arXiv / blog post / vision code

Below are some past personal and class projects.

clean-usnobDynamic Dust3r
Large-scale data generation of dynamic scenes in Blender, based off a combination of PointOdyssey and Kubrics. This provides pixel-perfect GT 3D point tracking which is used to fine-tuning dust3r to predict the static pointmap in one camera frame and a dynamic pointmap delta in the second camera frame, enabling tracking and reconstruction of dynamic scenes.
model code / datagen code
clean-usnobInterpertable image editing with latent objects
We develop an image editing pipeline trained an unlabeled image collection. We train a diffusion model to decode a set of region features obtained from a off-the-shelf encoder & segmentation model. By heavily augmenting the denoising target and using contrastive losses, we learn a latent bottleneck that allows for positional and semantic control for individual objects in a given image.
code and visualizations

Design taken from Jon Barron's website.   |   To see some 🔮, load this page while disconnected from the internet.