I consult on NLP and geospatial problems. Please see below for a list of my projects. If my interests align with a problem you have, please inquire!

Text classification and information extraction (Apixio, 2017–present)

I work with the fantastic data science team at Apixio (now Datavant) to build models that scan unstructured medical text for items of clinical or administrative salience. Projects include:

ICD detection

PyTorch

Transformers

Extreme multi-label classification

Vitals and lab data extraction

Tensorflow

BiLSTM

Attention

Face-to-face classification

Logistic regression

Feature engineering

Date-of-service extraction

Logistic regression

Feature engineering

Custom loss

Keyword relevance (Vynca, 2025)

For Vynca I created LLM prompts to decide if keywords in medical text actually describe the present condition of a patient. LLM Prompt engineering HIPAA

Client report generation (Stealth startup, 2024)

This startup does research on behalf of clients seeking reputable medical practitioners along specific criteria. I created LLM prompts to summarize research results into fluent, accurate, and tone-appropriate client letters spanning several pages. LLM Prompt engineering Few-shotting HIPAA

Customer feedback modeling (Solvvy, 2020)

For Solvvy I applied unsupervised admixture-clustering models to customer-generated content as a way to understand customer feedback. Topic model Latent Dirichlet allocation

Kind words from the CTO:

Will has been one of the most thorough, diligent, honest, and intelligent professionals I have ever worked with. Not only does he have a fantastic command of advanced ML techniques and algorithms, but he wields that knowledge with all the prudence and practicality required by industrial research applications. Will was an absolute pleasure to work with and I look forward to collaborating many more times in the future! —Justin Betteridge, CTO at Solvvy

Oilfield groundwater monitoring (USGS, 2016–2024)

I assist the California Oil, Gas, and Groundwater Program at the US Geological Survey in its ongoing effort to monitor groundwater resources in and around California oilfields. My teammates and I combine petrophysical models and Gaussian process to jointly model related quantities such as rock conductivity, rock porosity, temperature, and groundwater composition. Gaussian process Archie's law

Papers

Groundwater salinity mapping using geophysical log analysis within the Fruitvale and Rosedale Ranch oil fields, Kern County, California, USA. Michael J. Stephens, David H. Shimabukuro, Janice M. Gillespie, and Will Chang. Hydrogeology Journal. 2018.
Stratigraphic and structural controls on groundwater salinity variations in the Poso Creek Oil Field, Kern County, California, USA. Michael J. Stephens, David H. Shimabukuro, Will Chang, Janice M. Gillespie, and Zack Levinson. Hydrogeology Journal. 2021.
Mapping aquifer salinity gradients and effects of oil field produced water disposal using geophysical logs: Elk Hills, Buena Vista and Coles Levee Oil Fields, San Joaquin Valley, California. Janice M. Gillespie, Michael J. Stephens, Will Chang, and John G. Warden. PLOS ONE. 2022.
Groundwater elevation data and models in and around select California oil fields. Michael J. Stephens, Will Chang, Janice M. Gillespie, Peter B. McMahon, Tracy A. Davis, John G. Warden. U.S. Geological Survey data release. 2023.

Linguistic phylogenetics (Graduate Linguistics, 2007–2015)

It was linguistics that turned me into a statistician. As a first-year grad student I was astonished by a statistical analysis that inferred the shape and chronology of the family tree of Indo-European languages. How can these matters of human judgment be quantified, and how can any amount of math capture the relevant phenomena? However, as much as I admired the paper, I resisted its conclusion, which is that Indo-European languages are 9,000 years old. Almost all linguists believe 6,000 years to be more accurate. So this paper simultaneously gave me something to strive for and against, and shaped the rest of my career. Seven years and countless stats classes later, I coauthored a response. Now I use math to model human judgment every day.

Papers

Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis. Will Chang, Chundra Cathcart, David Hall, and Andrew Garrett. Language 91.1:194-244. 2015.
Press coverage: Science/AAAS News, New York Times.
Awards: Best Paper in Language.
A relaxed admixture model of language contact. Will Chang and Lev Michael. Language Dynamics and Change 4:1-26. 2014.
Exploring phonological areality in the circum-Andean region using a naive Bayes classifier. Lev Michael, Will Chang, and Tammy Stark. Language Dynamics and Change 4:27-86. 2014.

Websites

2013–2015. I helped to maintain the South American Phonological Inventory Database.

Talks

Notes on Bayesian Lexicostatistics. LING 230. April 2016.
A vanishing, multiple-gain lexical trait model. Workshop Towards a Global Language Phylogeny. Max Planck Institute for the Science of Human History, Jena. September 2014.
Linguistic mirages and lexical borrowing between Tongan and Samoan. 9th International Conference On Oceanic Linguistics (COOL9). University of Newcastle, Australia. February 2013.
The distribution of Polynesian words. 39th Annual Meeting of the Berkeley Linguistics Society. University of California, Berkeley. February 2013.
Probabilistic generative models of language contact. Workshop on Quantitative Approaches to Areal Linguistic Typology. Koninklijke Nederlandse Akademie van Wetenschappen, Amsterdam. December 2012.

Education

M.A. Linguistics, U.C. Berkeley. 2009.
M.S. Computer Science, U.C. Berkeley. 1998.
B.S. Electrical Engineering / Computer Science, U.C. Berkeley. 1994.

Employment

Sr Research Scientist, Semantic Machines, 2014–2016.
Sr Software Engineer, Cadence Design Systems, 1998–2005.