Will Chang
Machine learning freelancer
will@hypergradient.ai
I consult on NLP and geospatial problems. Please see below for a
list of my projects. If my interests align with a problem you
have, please inquire!
Text classification and information extraction (Apixio, 2017–present)
I work with the fantastic data science team at Apixio (now Datavant) to build models that
scan unstructured medical text for items of clinical or administrative
salience. Projects include:
ICD detection (2022-2024). I am on a team to build a classifier
that detects mention of any of 15,000 disease conditions in unstructured
medical text, and looks for evidence of whether a mentioned disease is being
treated.
PyTorch
Transformers
Extreme multi-label classification
Vitals and lab data extraction (2019-2021). I built a model
to extract the names, dates, quantities, and units of vitals and lab
measurements from unstructured medical text. The model could be
trained with page-level labels, but returns token-level predictions.
Tensorflow
BiLSTM
Attention
Face-to-face classification (2018). I built a binary
classifier to decide if a medical note describes a face-to-face
encounter between a provider and a patient.
Logistic regression
Feature engineering
Date-of-service extraction (2018). I built a model
to extract the date of service from patient encounter notes.
It worked with page-level, type-level labels, and was tolerant
of the fact that a date of service that applies to one page was
often found only on an adjacent page.
Logistic regression
Feature engineering
Custom loss
Keyword relevance (Vynca, 2025)
For Vynca I created
LLM prompts to decide if keywords in medical text actually
describe the present condition of a patient.
LLM
Prompt engineering
HIPAA
Client report generation (Stealth startup, 2024)
This startup does research on behalf of clients seeking
reputable medical practitioners along specific criteria.
I created LLM prompts to summarize research results into
fluent, accurate, and tone-appropriate client letters spanning
several pages.
LLM
Prompt engineering
Few-shotting
HIPAA
Customer feedback modeling (Solvvy, 2020)
For Solvvy
I applied unsupervised admixture-clustering models to customer-generated content
as a way to understand customer feedback.
Topic model
Latent Dirichlet allocation
Kind words from the CTO:
Will has been one of the most thorough, diligent, honest, and
intelligent professionals I have ever worked with. Not only does
he have a fantastic command of advanced ML techniques and algorithms,
but he wields that knowledge with all the prudence and practicality
required by industrial research applications. Will was an absolute
pleasure to work with and I look forward to collaborating many
more times in the future! —Justin Betteridge, CTO at
Solvvy
Oilfield groundwater monitoring (USGS, 2016–2024)
I assist the California
Oil, Gas, and Groundwater Program at the US Geological Survey
in its ongoing effort to monitor groundwater resources in and
around California oilfields. My teammates and I combine petrophysical
models and Gaussian process to jointly model related quantities
such as rock conductivity, rock porosity, temperature, and
groundwater composition.
Gaussian process
Archie's law
Papers
-
Groundwater salinity mapping using geophysical log analysis
within the Fruitvale and Rosedale Ranch oil fields, Kern
County, California, USA. Michael J. Stephens, David H.
Shimabukuro, Janice M. Gillespie, and Will Chang.
Hydrogeology Journal. 2018.
-
Stratigraphic and structural controls on groundwater salinity
variations in the Poso Creek Oil Field, Kern County, California,
USA. Michael J. Stephens, David H. Shimabukuro, Will Chang,
Janice M. Gillespie, and Zack Levinson.
Hydrogeology Journal. 2021.
-
Mapping aquifer salinity gradients and effects of oil field
produced water disposal using geophysical logs: Elk Hills,
Buena Vista and Coles Levee Oil Fields, San Joaquin Valley,
California. Janice M. Gillespie, Michael J. Stephens, Will
Chang, and John G. Warden. PLOS ONE. 2022.
-
Groundwater elevation data and models in and around select
California oil fields. Michael J. Stephens, Will Chang,
Janice M. Gillespie, Peter B. McMahon, Tracy A. Davis, John
G. Warden. U.S. Geological Survey data release. 2023.
Linguistic phylogenetics (Graduate Linguistics, 2007–2015)
It was linguistics that turned me into a statistician. As a
first-year grad student I was astonished by a statistical
analysis that inferred the shape and chronology of the family
tree of Indo-European languages. How can these matters of human
judgment be quantified, and how can any amount of math capture
the relevant phenomena? However, as much as I admired
the paper, I resisted its conclusion, which is that Indo-European
languages are 9,000 years old. Almost all linguists believe 6,000
years to be more accurate. So this paper simultaneously gave me
something to strive for and against, and shaped the
rest of my career. Seven years and countless stats classes later,
I coauthored a response. Now I use math
to model human judgment every day.
Papers
Websites
Talks
Education
M.A. Linguistics, U.C. Berkeley. 2009.
M.S. Computer Science, U.C. Berkeley. 1998.
B.S. Electrical Engineering / Computer Science, U.C. Berkeley. 1994.
Employment
Sr Research Scientist, Semantic Machines, 2014–2016.
Sr Software Engineer, Cadence Design Systems, 1998–2005.