Portfolio

Updated: 5/31/2025

Education

M.S. Information, University of Michigan (May 2022)
B.A. Liberal Arts, University of Michigan (May 2020)

Certifications

Google Professional Data Engineer (exp. June 2025)
Google Associate Cloud Engineer (exp. March 2025)
SnowPro Core Certification (exp. July 2025)
Machine Learning Engineering for Production (MLOps) from DeepLearning.AI

Publications

Ensemble Retrieval Strategies for an Improved NAICS Search Engine in the U.S. Census Bureau (Joint Statistical Meetings, 2024)

Work Experience

Sr. AI Engineer @ Reveal Global Consulting (October 2023 - Now)

Lead developer for IRIS, an AWS-native application hosting search engines across NAICS, NAPCS, OCC, and Schedule B products, built with Docker, ECS, Fargate, and Sagemaker servicing 1000+ users internally and deployed inexpensively
Built an LLM-driven pipeline for migrating Federal codebases in SAS to Python, indexing large codebases into graph databases and reconstructing them in Python, accepted to speak at the 2025 Joint Statistical Meetings (JSM)
Overseeing a team of 5 data scientists on code migration for critical surveys across the U.S. Census Bureau (AHS, CPS, SOMA, NSSRN, and Frames), supporting cloud migration efforts and reducing taxpayer burden
Designed an internal-use search engine for North American Industry Classification System (NAICS) analysts, Dockerized and wrapped in a Streamlit frontend to reduce response burden and facilitate human-in-the-loop feedback from experts
De facto ML Engineer for the AI/ML solutions and Statistical Modernization groups through guidance on deployment best practices with CI/CD pipelines, cloud engineering, and monitoring and observability (including tools specialized for LLMs)

Data Scientist @ KPMG, Digital Lighthouse (July 2022 - August 2023)

Designed a Google Cloud pipeline for summarizing earnings call transcripts using LangChain’s framework for chaining a VertexAI model with a VectorStore to provide contextual querying based on predefined prompts for a large banking client
Performed attrition modeling for a leading global life sciences company, analyzing attrition for vulnerable populations with survival curves, time series analyses, and statistical inference, delivering a set of Power BI dashboards for monitoring risk
Repaired an outdated ML pipeline predict box office results for a large streaming service using social media sentiment with 70%+ accuracy by updating deprecated API features, streamlining comment scraping, and improving regularization methods
Presented on bootstrapping generative AI approaches for over 230 colleagues, providing tutorials on creating Streamlit.io demos with OpenAI, summarization tasks, and prompt engineering strategies for KPMG’s internal ChatGPT API with Jupyter

Natural Language Processing (NLP) Projects

Twitter Plug-In for Reducing Harmful Content

Built a Chrome extension capable of filtering a Twitter feed based on negative content (e.g. depressiveness) using an XgBoost model and packing into a RESTful API deployed on Heroku (Github, Medium)

Comparing Pre-trained and Fine-tuned Transformers Models on Patent Data

Compared performance of LLM models (BART, Pegasus, T5) before and after fine-tuning on patent data to measure generalizability of popular transformer models on scientific language and performance trade offs for size differences

Verizon Support and Sales Chatbot Service

Python implementation of a support agent capable of discerning a customer/user’s motivation and responding with the appropriate personality (tech support vs. sales) to serve their needs.

The project uses LangChain’s framework for chaining an augmented retrieval function to a chat LLM to provide document Q&A capabilities. The Agent uses the same LLM and a set of tools with predefined use-cases and retrieval mechanisms to interpret the user input and respond with the appropriate tool. The data used in this repo is a collection of Verizon FAQ materials scraped from their blog.

This is a rudimentary approach approximating a large-scale chatbot service with billing that scales with customer demand and intelligent Q&A capabilities on a personalized dataset.

Excel Document Analyzer with LangChain and VertexAI

LangChain and LLM-backed application for providing Q&A capabilities on an Excel document

SI630-NLP

My complete implementation of assignments in SI-630: Natural Language Processing at the University of Michigan (Fall, 2021)

Topics include:

Information Retrieval Projects

SI650-IR

My complete implementation of assignments in SI-650: Information Retrieval at the University of Michigan (Spring, 2022)