Cameron Milne

Logo

LinkedIn | GitHub

Data Scientist with experience building data-intensive applications while delivering analytical solutions to improve decision-making.

Currently working for Reveal Global Consulting where I focus on NLP applications for the U.S. Census Bureau. Previously a Data Scientist on KPMG's Data, Analytics, and AI team working on projects in the Life Sciences, Banking, and Entertainment sectors.

View My GitHub Profile

Portfolio

Updated: 10/21/2023


Education


Certifications


Work Experience

Data Scientist @ KPMG, Digital Lighthouse (July 2022 - August 2023)


Natural Language Processing (NLP) Projects

Twitter Plug-In for Reducing Harmful Content

View on GitHub View on Medium

Built a Chrome extension capable of filtering a Twitter feed based on negative content (e.g. depressiveness) using an XgBoost model and packing into a RESTful API deployed on Heroku (Github, Medium)


Comparing Pre-trained and Fine-tuned Transformers Models on Patent Data

View on GitHub

Compared performance of LLM models (BART, Pegasus, T5) before and after fine-tuning on patent data to measure generalizability of popular transformer models on scientific language and performance trade offs for size differences


Verizon Support and Sales Chatbot Service

View on GitHub

Python implementation of a support agent capable of discerning a customer/user’s motivation and responding with the appropriate personality (tech support vs. sales) to serve their needs.

The project uses LangChain’s framework for chaining an augmented retrieval function to a chat LLM to provide document Q&A capabilities. The Agent uses the same LLM and a set of tools with predefined use-cases and retrieval mechanisms to interpret the user input and respond with the appropriate tool. The data used in this repo is a collection of Verizon FAQ materials scraped from their blog.

This is a rudimentary approach approximating a large-scale chatbot service with billing that scales with customer demand and intelligent Q&A capabilities on a personalized dataset.


Excel Document Analyzer with LangChain and VertexAI

View on GitHub

LangChain and LLM-backed application for providing Q&A capabilities on an Excel document


SI630-NLP

View on GitHub

My complete implementation of assignments in SI-630: Natural Language Processing at the University of Michigan (Fall, 2021)

Topics include:


Information Retrieval Projects


SI650-IR

View on GitHub

My complete implementation of assignments in SI-650: Information Retrieval at the University of Michigan (Spring, 2022)

Topics include:


Data Science Projects


Machine Learning References

View on GitHub

Collection of ML use-cases, implementations, and cleaned datasets for reference. Scripts are sourced from personal projects and coursework.

|--PyTorch
|   |-- Classifying the Political Framing of Campaign Emails (Logistic Regression)
|   |-- Train a Word2Vec model on Wikipedia Biographies with debiasing (Tensorboard)
|
|--Keras
|   |-- Computer Vision with CNN
|
|--HuggingFace (Transformers)
|   |-- Predicting Helpful Stack Overflow Answers and Data Annotation/Measuring Annotation Quality
|   |-- Pattern-Based Learning (Exploitation Training) for Toxic Language
|
|--Scikit-Learn
|
|--Coursework Examples
|   |--SI630 - Natural Language Processing
|   |   |-- Classifying the Political Framing of Campaign Emails (Logistic Regression)
|   |   |-- Train a Word2Vec model on Wikipedia Biographies with debiasing (Tensorboard)
|   |   |-- Predicting Helpful Stack Overflow Answers and Data Annotation/Measuring Annotation Quality (HuggingFace)
|   |   |-- Pattern-Based Learning (Exploitation Training) for Toxic Language
|   |
|   |--SI670 - Applied Machine Learning
|   |   |--TBD
|   |
|   |--SI671 - Data Mining
|   |   |-- Mining and Evaluating Frequent Itemsets on Twitter Emojis
|   |   |-- Time Series analysis of COVID-19 trends for G7 Nations
|   |   |-- Social Network Analysis for Amazon Product Reviews

MLOps References

View on GitHub

My complete implementation of assignments in Machine Learning Engineering for Production (MLOps) Specialization taught by Andrew Ng on Coursera. This repo is a collection of the scripts and projects for future reference.


Tech Stack