Mug
Will Chang
Machine learning freelancer
will@hypergradient.ai

I consult on NLP and geospatial problems. Please see below for a list of my projects. If my interests align with a problem you have, please inquire!

Text classification and information extraction (Apixio, 2017–present)

I work with the fantastic data science team at Apixio (now Datavant) to build models that scan unstructured medical text for items of clinical or administrative salience. Projects include:

  • Keyword relevance (Vynca, 2025)

    For Vynca I created LLM prompts to decide if keywords in medical text actually describe the present condition of a patient. LLM Prompt engineering HIPAA

    Client report generation (Stealth startup, 2024)

    This startup does research on behalf of clients seeking reputable medical practitioners along specific criteria. I created LLM prompts to summarize research results into fluent, accurate, and tone-appropriate client letters spanning several pages. LLM Prompt engineering Few-shotting HIPAA

    Customer feedback modeling (Solvvy, 2020)

    For Solvvy I applied unsupervised admixture-clustering models to customer-generated content as a way to understand customer feedback. Topic model Latent Dirichlet allocation

    Kind words from the CTO:

    Will has been one of the most thorough, diligent, honest, and intelligent professionals I have ever worked with. Not only does he have a fantastic command of advanced ML techniques and algorithms, but he wields that knowledge with all the prudence and practicality required by industrial research applications. Will was an absolute pleasure to work with and I look forward to collaborating many more times in the future! —Justin Betteridge, CTO at Solvvy

    Oilfield groundwater monitoring (USGS, 2016–2024)

    I assist the California Oil, Gas, and Groundwater Program at the US Geological Survey in its ongoing effort to monitor groundwater resources in and around California oilfields. My teammates and I combine petrophysical models and Gaussian process to jointly model related quantities such as rock conductivity, rock porosity, temperature, and groundwater composition. Gaussian process Archie's law

    Papers

    Linguistic phylogenetics (Graduate Linguistics, 2007–2015)

    It was linguistics that turned me into a statistician. As a first-year grad student I was astonished by a statistical analysis that inferred the shape and chronology of the family tree of Indo-European languages. How can these matters of human judgment be quantified, and how can any amount of math capture the relevant phenomena? However, as much as I admired the paper, I resisted its conclusion, which is that Indo-European languages are 9,000 years old. Almost all linguists believe 6,000 years to be more accurate. So this paper simultaneously gave me something to strive for and against, and shaped the rest of my career. Seven years and countless stats classes later, I coauthored a response. Now I use math to model human judgment every day.

    Papers

    Websites

    Talks

    Education

    M.A. Linguistics, U.C. Berkeley. 2009.
    M.S. Computer Science, U.C. Berkeley. 1998.
    B.S. Electrical Engineering / Computer Science, U.C. Berkeley. 1994.

    Employment

    Sr Research Scientist, Semantic Machines, 2014–2016.
    Sr Software Engineer, Cadence Design Systems, 1998–2005.