Madison, WI · open to opportunities

Kalynn
Willis

Data Engineer & Data Scientist

I build AI products and the data pipelines behind AprilAire's connected air-quality devices — with a background in biology and genetics research.

Kalynn Willis at UW-Madison graduation

UW–Madison, 2026

01ABOUT

About

I'm Kalynn Willis, a Statistics & Data Science graduate from UW–Madison with a background that started in molecular biology and genetics before pulling me into data science and engineering.

At UW's Biochemistry department, I built RAG-based research chatbots and tools for querying large genetic datasets. At Arrowhead Pharmaceuticals, I worked on machine learning workflows for drug discovery. Now at AprilAire, I'm building Merv — an AI assistant for personalized healthy-air recommendations — along with MCP servers, ETL pipelines, and IoT infrastructure that connect device data, cloud systems, and conversational AI.

I like building systems that sit between raw data and actual decision-making: retrieval systems, internal tools, APIs, data pipelines, and AI interfaces that people can interact with directly. Most of my work ends up somewhere between software engineering, data engineering, and applied machine learning.

3.92
GPA
Dean's List
3+
Industry roles
B.S.
Stats & Data Science
02EXPERIENCE

Experience

Data Engineer Co-op

AprilAire

Jan 2026 – Present
  • Conceived and built Merv, an AI healthy-air assistant that recommends personalized HVAC and air-quality products through conversational interactions — currently in development, to be rolled out across AprilAire's web and mobile platforms.
  • Built MCP servers that connect LLM agents to thermostats, indoor air systems, and cloud APIs, enabling conversational device control and predictive automation based on weather forecasts and live sensor data.
  • Developed AWS ETL pipelines and monitoring infrastructure for ingesting IoT telemetry, surfacing pipeline failures quickly, and supporting downstream analytics and ML workflows.
PythonAWSXGBoostMCPFastAPIIoT

Data Science Undergraduate Researcher

UW–Madison Department of Biochemistry

Oct 2024 – Dec 2025
  • Proposed and built a RAG-based research chatbot that lets biologists query QTL and phenotype datasets in natural language instead of writing SQL or custom scripts.
  • Integrated live genomic resources including Ensembl, GTEx, IMPC, and JAX while building interactive dashboards for QTL exploration and gene annotation analysis.
  • Engineered distributed genomics workflows using DuckDB, Docker, HTCondor, and vector retrieval pipelines to support large-scale biological data analysis.
Co-authored bioRxiv preprint on gene/isoform QTL in Diversity Outbred mice
PythonFastAPIRAGDuckDBDockerHTCondorRShiny

Data Science Intern

Arrowhead Pharmaceuticals

May 2025 – Aug 2025
  • Worked with the AI/ML team to build gradient-boosting models that predicted molecular and genetic properties for drug-discovery prioritization.
  • Automated ETL and CI/CD workflows for large experimental datasets, improving reproducibility and reducing manual preprocessing across research pipelines.
  • Built tooling for managing and validating biological data used in downstream modeling and analysis.
Pythonscikit-learnPandasETLCI/CDBioinformatics
03SELECTED WORK

Selected Work

04PUBLICATIONS

Publications

Preprint · under reviewbioRxiv · 2026

Distinct genetic architecture of gene and isoform level QTL in the Diversity Outbred (DO) mouse population

Mapping gene- and isoform-level expression QTL across a genetically diverse mouse population, showing that genetic variation drives allele-specific isoform usage missed by gene-level analysis alone.

Charles I Opara, Kelly A Mitok, Christopher H Emfinger, Katheryn L Schueler, Donnie S Stapleton, Nancy A Benkusky, Udaya Gardiparthi, Kalynn H Willis, Victor Ruotti, Brian S Yandell, Mark P Keller, Alan D Attie

05SKILLS

Skills

Languages

PythonRTypeScriptSQLBash

AI & Machine Learning

LLMsRAGVector EmbeddingsMCPBedrock AgentCorePyTorchXGBoostscikit-learnOpenAIOllamaFastAPI

AWS & Cloud

AWSLambdaStep FunctionsGlueAthenaS3AWS CLIREST APIsCI/CDDocker

Data & Databases

PostgreSQLDuckDBSQLiteETL PipelinesExcelHTCondor

Tools & Web

GrafanaRShinyNext.jsTailwindCSS
06BEYOND THE CODE

Beyond the Code

Hiking

Hiking in snowy mountains

Reading

Reading in a hammock outside

Traveling

Zion National Park

Cooking

A candlelit dinner table
07CONTACT

Get in Touch

I'm open to new opportunities, collaborations, and conversations about data and AI. Whether you have a question or just want to say hi, my inbox is always open.

kalynnhopewillis@gmail.com