cat about.md
I'm a senior AI research scientist at Recursion, where I build machine learning models for drug discovery. My work spans molecular property prediction, uncertainty quantification, and active learning — building tools that help drug designers find new medicines more efficiently.
Before industry, I studied mathematics at Oxford (MMath, 1st class) and completed a DPhil in Systems Biology, applying Bayesian inference and stochastic modelling to problems in cell biology. I bring a rigorous quantitative foundation to applied ML problems in pharma.
cat experience.yml
- Training and fine-tuning foundation models for molecular property prediction
- Analysis cited in Recursion 10-K: ">2.5x increased efficiency in detecting new bioactive scaffolds with >40% reduction in flagging of likely cytotoxic compounds"
- Uncertainty quantification for ML models with applications in Bayesian Optimization and Active Learning
- Led product team for molecular property prediction
- Core contributor to Molflux, an open-source ML ecosystem for chemistry
- ML models and active learning software used day-to-day by dozens of drug designers
- Modelled chromosome dynamics using Bayesian statistics and stochastic differential equations
- Developed computational tools providing insight into how and why cell division goes wrong
- Supervised MSc rotation projects for 2 PhD students
- Event detection in football matches using deep learning
ls ~/projects/
Molflux
Core contributor to an open-source ML ecosystem for chemistry. Provides tools for molecular featurisation, model training, and deployment. Used in production by drug design teams.
Polaris Competition
Achieved 2nd place in a public molecular property prediction benchmark competition (2025). Demonstrated state-of-the-art predictive modelling on real-world drug discovery datasets.
cat skills.json
"expert": ["Python", "Julia"],
"proficient": ["C++", "R", "Stan"],
"tools": ["Git", "Linux", "Docker"],
// domains
"ml": ["Molecular Property Prediction", "Foundation Models", "Uncertainty Quantification"],
"drug_discovery": ["Active Learning", "Bayesian Optimization", "Cheminformatics"],
"statistics": ["Bayesian Inference", "Stochastic Modelling", "MCMC"]
git log --oneline
- d4e5f6a
- c9d0e1f
- a1b2c3d
- b7c8d9e
- f0a1b2c
- e3d4c5b
- a6b7c8d
echo $CONTACT
- github: github.com/shug3502
- scholar: Google Scholar