Modern astronomy faces an unprecedented data challenge: both theoretical simulations and observational surveys now produce data volumes that push the limits of traditional computational approaches. I develop machine learning and computational tools designed to address these challenges across the astronomical pipeline from replacing months of supercomputer time with millisecond neural network predictions, to deploying real-time systems on active telescopes.
What ties my work together is a fascination with representation and information. The right mathematical encoding of a problem whether it's a wavelet basis for quasar spectra or a learned compression of simulation data can unlock orders-of-magnitude improvements. I keep finding that understanding where information lives in data is often more important than the specific algorithm you run on it.
When LSST begins its 10-year survey, one of the most ambitious scientific instruments ever built will need to maintain optical alignment every 36 seconds as it scans the sky. I developed AIDonut, a CNN-based system that predicts and corrects optical aberrations in real time delivering 30× faster inference than the existing physics-based solver while meeting stringent image quality requirements.
The core challenge was sim-to-real transfer: training data came from perfect physics simulations, but real telescopes are messy — atmospheric turbulence, mechanical vibrations, thermal deformations. Standard domain adaptation failed because the distribution shift was too severe. What worked was treating this as a physics-informed learning problem: I designed a two-stage architecture that learns robust features from 256K synthetic images, then fine-tunes on 2.1M real observations with a loss function that penalizes predictions outside physically plausible bounds. The system deployed to production in September 2025.
KBMOD searches for faint moving objects (distant trans-Neptunian objects) by shifting and stacking images along candidate trajectories. The existing classifier worked well on bright detections but failed on the fainter 5σ candidates where the real discoveries live. I found the previous model had never learned any astronomy at all rather defaulted to a trivial pixel threshold that gave perfect accuracy on high SNR data.
I fixed the data pipeline and designed a ResNet with squeeze-and-excitation attention that lets the network learn information at different signal-to-noise levels. The new model identifies thousands of high-confidence novel candidates in the existing cross-match catalog and new objects. We are trying to fit orbitals for detecting these new objects.
Understanding how the first stars ionized the universe requires simulations that resolve kiloparsec-scale gas clumping inside hundred-megaparsec cosmic volumes is a computational problem that's fundamentally intractable at full resolution. I built a deep-learning emulator that takes simulation parameters (redshift, radiation field strength, local density, reionization timing) as input and directly predicts the ionizing photon mean free path, bypassing the need for individual N-body simulations entirely.
The emulator achieves 1.6% median error across nearly four orders of magnitude in mean free path and runs in milliseconds on standard hardware. I used it to constrain the reionization midpoint from observed mean free path data that's consistent with Planck CMB measurements but derived from completely independent observations. The paper also reveals a 2–3× decline in ionizing emissivity that standard power-law models of mean free path evolution can't explain.
Rubin will generate about 20 TB of data every night for ten years. Lossless compression barely helps on simulation data (~3×). I'm developing adaptive algorithms that learn to identify and preserve scientifically critical features without supervision — the network discovers importance patterns from data statistics alone, allocating bits non-uniformly across features. It turns out cosmological structure lives on a low-dimensional manifold in data space, even though no one told the network what to look for.
Standard exoplanet detection methods can't uniquely determine both mass and orbital eccentricity — you get a family of degenerate solutions. Periodic orbits, configurations where a planetary system returns to its exact initial state, exist at discrete points in parameter space and break this degeneracy. I co-developed a Julia package that uses automatic differentiation to find these solutions, validated it on HR 8799 (stable over 10,000 years), and applied it to TRAPPIST-1 — leading to a Co-I role on an accepted JWST Cycle 4 proposal to confirm a candidate planet.
The relative velocity between dark matter and baryons after recombination is a subtle effect from the early universe that turns out to have significant consequences for halo mass functions and the timing of first structure formation. Implementing the physics in the MP-Gadget simulation framework to quantify the impact on high-redshift predictions.
The standard approach in Lyman-α forest cosmology uses power spectra — essentially just the Fourier transform squared. It captures Gaussian information but throws away all phase relationships and non-Gaussian structure. Everyone knew this was lossy, but quantifying the loss was hard. I applied the wavelet scattering transform — a technique from signal processing that preserves both second-order statistics and higher-order moments through a cascade of wavelet transforms and nonlinearities — and showed it improves cosmological parameter constraints by orders of magnitude using Fisher information analysis.
The deeper insight: representation matters more than algorithm. The right mathematical encoding of the data unlocked information that was always there but invisible to traditional methods. This idea — that finding where information lives is the real bottleneck — has shaped everything I've done since.
My undergraduate thesis. Used Planck CMB data to test whether the fine-structure constant has changed over 13.8 billion years by solving the dynamics of scalar field models and constraining their free parameters with principal component analysis and Monte Carlo simulations. We didn't find evidence of variation — probably good news for physics — but set some of the tightest constraints to date.
Worked on Project Premonition, which tracks disease vectors by analyzing blood collected by mosquitoes. Built a time-series LSTM to identify mosquito species from wingbeat interference pattern, outperforming the existing baseline. The main challenge was separating species-specific wingbeat patterns from environmental noise which is a signal processing problem that connected nicely to my later work on information extraction.
Galaxy clusters should be cooling catastrophically, but they're not — supermassive black holes create "bubbles" that heat the gas. I ran simulations testing whether the correlations we observe are actually specific to black hole feedback or just generic properties of hot cluster gas. Some of what we thought were real cavities might just be noise.
Used AdS/CFT correspondence to compute correlation functions for Lifshitz field theories — hard quantum field theory calculations that become tractable gravity problems in higher dimensions. Separately, wrote bispectral analysis code for studying energy transfer in plasmas, now being used to study spiral arm stability in galaxies.