- Presenting Reverse Distillation at ICLR 2026. See you in Rio!
- Presenting USHER and DeltaNMF at RECOMB 2026. See you in Greece!
- PRECISE: structure-aware language for small-molecule binding (preprint)
- Welcoming Anish Karpurapu, Jiazheng Miao, Mun Hong Fong, Tom Pan and Diane Zhang to the lab!
- Honey I shrunk the proteins! Raygun was featured in Duke News!
- Rohit was nominated for the Packard Fellowship by Duke
I am an Assistant Professor at Duke University, in the Departments of Biostatistics & Bioinformatics and Cell Biology, with appointments in Computer Science and Electrical & Computer Engineering. Our lab develops methods to explain and control biological systems across scales, from individual proteins to cellular networks. Foundation models, large neural networks trained on millions of protein sequences or single-cell transcriptomes, are emerging as powerful instruments for this effort. We treat these models not just as prediction engines, but as scientific instruments whose internal representations encode organizing principles that traditional analyses miss.
At the molecular scale, we leverage protein language models to decode how protein sequence maps to function. This has led to methods for predicting drug-target interactions, uncovering kinase substrate specificities, identifying allosteric sites from sequence alone, and designing smaller, better-functioning proteins for gene therapy and molecular sensing. At the systems scale, we develop algorithms for single-cell and spatial omics that reveal how genome architecture, chromatin organization, and intercellular signaling give rise to cellular decisions. Foundation models trained on millions of transcriptomes allow us to connect these layers, capturing tissue-level organization and multimodal structure that piecemeal analyses obscure.
Methodologically, we draw on a blend of classical algorithmics (combinatorial optimization, graph theory), machine learning, and statistics, with an emphasis on methods that yield insight rather than just prediction. We have a deep interest in representation learning: understanding what makes a good embedding, how to build one, and what biological knowledge is encoded within it.
Before academia, I spent several years in quantitative finance, building models to extract signal from noisy, stochastic systems. I remain interested in problems at the intersection of machine learning, stochastic modeling, and high-dimensional data.
Positions available: If you're interested in a postdoc or if you are a Duke undergraduate or graduate student and interested in problems that relate to the topics above, please reach out!


