Richard Diehl Martinez
I am a fourth-year Computer Science Ph.D. student and Gates Scholar at the University of Cambridge. Previously, I worked as an Applied Research Scientist at Amazon Alexa, focusing on language modeling research. I have a M.S. in Computer Science and a B.S. in Management Science from Stanford University.
Currently, I research pre-training techniques to improve the performance of small language models relative to large models. More broadly, my interests lie at the intersection of machine learning, linguistics, and neuroscience. If you're curious, check out some of my papers.
I also publish a bi-weekly NLP newsletter on Substack.
Select Publications
Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing
Richard Diehl Martinez, Zebulon Goriely, Andrew Caines, Paula Buttery, Lisa Beinborn
Conference: EMNLP 2024
Language models over-rely on token frequency during pre-training, leading to poor generalization for infrequent tokens and anisotropic representations. Our method, Syntactic Smoothing, induces a syntactic prior to improve model performance on rare tokens and reduce anisotropy.
Tending Towards Stability: Convergence Challenges in Small Language Models
Richard Diehl Martinez, Pietro Lesci, Paula Buttery
Conference: EMNLP Findings 2024
Smaller language models struggle to converge as efficiently as larger ones, especially in later training stages. Our analysis of the Pythia model suite investigates how the effective rank of parameters impacts convergence dynamics across model sizes.
SumTablets: A Transliteration Dataset of Sumerian Tablets
Cole Simmons, Richard Diehl Martinez, Dan Jurafsky
Workshop: ACL Workshop 2024
SumTablets offers the largest collection of Unicode glyph–transliteration pairs for Sumerian cuneiform tablets, enabling NLP techniques for transliteration with nearly 7 million glyphs across 91,606 tablets. Released as a Hugging Face Dataset.
CLIMB: Curriculum Learning for Infant-inspired Model Building
Richard Diehl Martinez, Hope McGovern, Zebulon Goriely, Christopher Davis, Andrew Caines, Paula Buttery, Lisa Beinborn
Conference: CoNNL 2023 (Best Paper)
Our CLIMB model, built for the BabyLM Challenge, uses cognitively-inspired curriculum learning to improve small language model training. We explore vocabulary, data, and objective curricula to enhance linguistic generalization capabilities.
Attention-based Contextual Language Model Adaptation for Speech Recognition
Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, Ariya Rastrow, Andreas Stolcke, Ankur Gandhe
Conference: ACL 2021
We introduce a contextual attention mechanism for language models in speech recognition, incorporating non-linguistic data like utterance time. Our model outperforms conventional LMs, reducing perplexity by 9.0% in long-tail utterances.
Automatically Neutralizing Subjective Bias in Text
Reid Pryzant, Richard Diehl Martinez, Nathan Dass, Sadao Kurohashi, Dan Jurafsky, Diyi Yang
Conference: AAAI 2020
We present the first dataset for automatically neutralizing subjective bias in text, sourced from Wikipedia edits. Our BERT-based models achieve strong performance in identifying and neutralizing biased language across four domains.
Projects
Grapevine
An AI-powered recommendation tool that analyzes user preferences to suggest the perfect wine.
Ignition
An End-to-End Supervised Model for Training Simulated Self-Driving Vehicles.
Via: Illuminating Academic Pathways at Scale
A graph neural network framework that helps students create personalized academic journeys.
Optimizing Airbnb Listings with GANs
A GAN-based model that generates optimized Airbnb listing descriptions to boost booking rates.