Data Scientist with experience building data-intensive applications while delivering analytical solutions to improve decision-making.
Currently at Reveal Global Consulting building language models for the U.S. Census Bureau. Previously a Data Scientist on KPMG's Data, Analytics, and AI team working on projects in the Life Sciences, Banking, and Entertainment sectors.
Updated: 5/31/2025
Sr. AI Engineer @ Reveal Global Consulting (October 2023 - Now)
Data Scientist @ KPMG, Digital Lighthouse (July 2022 - August 2023)
Built a Chrome extension capable of filtering a Twitter feed based on negative content (e.g. depressiveness) using an XgBoost model and packing into a RESTful API deployed on Heroku (Github, Medium)
Compared performance of LLM models (BART, Pegasus, T5) before and after fine-tuning on patent data to measure generalizability of popular transformer models on scientific language and performance trade offs for size differences
Python implementation of a support agent capable of discerning a customer/user’s motivation and responding with the appropriate personality (tech support vs. sales) to serve their needs.
The project uses LangChain’s framework for chaining an augmented retrieval function to a chat LLM to provide document Q&A capabilities. The Agent uses the same LLM and a set of tools with predefined use-cases and retrieval mechanisms to interpret the user input and respond with the appropriate tool. The data used in this repo is a collection of Verizon FAQ materials scraped from their blog.
This is a rudimentary approach approximating a large-scale chatbot service with billing that scales with customer demand and intelligent Q&A capabilities on a personalized dataset.
LangChain and LLM-backed application for providing Q&A capabilities on an Excel document
My complete implementation of assignments in SI-630: Natural Language Processing at the University of Michigan (Fall, 2021)
Topics include:
My complete implementation of assignments in SI-650: Information Retrieval at the University of Michigan (Spring, 2022)
Topics include:
Collection of ML use-cases, implementations, and cleaned datasets for reference. Scripts are sourced from personal projects and coursework.
|--PyTorch
| |-- Classifying the Political Framing of Campaign Emails (Logistic Regression)
| |-- Train a Word2Vec model on Wikipedia Biographies with debiasing (Tensorboard)
|
|--Keras
| |-- Computer Vision with CNN
|
|--HuggingFace (Transformers)
| |-- Predicting Helpful Stack Overflow Answers and Data Annotation/Measuring Annotation Quality
| |-- Pattern-Based Learning (Exploitation Training) for Toxic Language
|
|--Scikit-Learn
|
|--Coursework Examples
| |--SI630 - Natural Language Processing
| | |-- Classifying the Political Framing of Campaign Emails (Logistic Regression)
| | |-- Train a Word2Vec model on Wikipedia Biographies with debiasing (Tensorboard)
| | |-- Predicting Helpful Stack Overflow Answers and Data Annotation/Measuring Annotation Quality (HuggingFace)
| | |-- Pattern-Based Learning (Exploitation Training) for Toxic Language
| |
| |--SI670 - Applied Machine Learning
| | |--TBD
| |
| |--SI671 - Data Mining
| | |-- Mining and Evaluating Frequent Itemsets on Twitter Emojis
| | |-- Time Series analysis of COVID-19 trends for G7 Nations
| | |-- Social Network Analysis for Amazon Product Reviews
My complete implementation of assignments in Machine Learning Engineering for Production (MLOps) Specialization taught by Andrew Ng on Coursera. This repo is a collection of the scripts and projects for future reference.