About Me

Hello There!

I am a Data Scientist with 3 years of experience. My background is in mathematics and statistics, and I am experienced with Python, SQL, PyTorch, and Scikit-Learn. I was awarded Extra Miler of the Year at ALLDATA for my work on a novel string similarity metric and my work on ETLs for a couple products. I am fascinated by all things machine learning and AI, and I would like to use innovative techniques to solve novel problems.

Outside of work I enjoy my time in many ways. I love to hike, mountain bike, and go to the beach on the weekends. During the week I can be found playing tennis or soccer, running, or indoors cooking, baking, watching sports and shows, or gaming.

  • Work Experience

  • Data Scientist

    ALLDATA
    Nov 2020 - Current
    • Developed and implemented a Python package for a novel string similarity score, replacing a deep learning model to map vehicle information between standards and extract vehicle components from free text. This innovation increased efficiency by approximately 80% while maintaining accuracy.
    • Built an automated cloud-based data pipeline in GCP that processes millions of rows of data from all US-based AutoZone stores monthly and prepares it for ingestion into a highly prioritized new product.
    • Currently designing and training a deep learning Tensorflow model as part of an NLP pipeline to estimate labor hours given vehicle information and a work description
    • Awarded “Extra Miler of the Year” for 2023 for my work on multiple data pipelines that added millions of records into two new products.
  • Data Science Intern

    New York Mets
    Jan 2020 - Aug 2020
    • Minimized runs scored against the Mets by building an outfield defensive alignment model combining results from a hit classification model and a catch probability model using Scikit-Learn and XGBoost.
    • Created visualizations for manual validation of the outfield defensive model's performance using Matplotlib
    • Developed a novel dataset of MLB venue outfield wall distance measurements using data interpolation and spline fitting for use as a feature in the hit classification model and for future use by the Mets' analyst team.
    • Constructed a multi-dimensional distribution for estimating the likelihood of a hit outcome using SciPy, integrated as a component of the defensive alignment model.

  • Education

  • MS, Data Science

    University of San Francisco
    2020 - 2021
    • ReadTheSign Project: Acted as CTO for a 7 person team developing a web application for translating videos of American Sign Language gestures to text using PyTorch. Deployed the web application to AWS Elastic Beanstalk using Docker.
    • Coursework: Machine Learning, Deep Learning, Case Studies (Network Analysis, Topic Modeling), SQL, Distributed Computing (Spark), Time Series Analysis, Design of Experiments (A/B testing), Data Structures and Algorithms, Entrepreneurship (Web App Development), Data Ethics.
    • 3.81 GPA
  • BS, Mathematical Science

    University of California, Santa Barbara
    2016 - 2020
  • Minor, Statistical Science

    University of California, Santa Barbara
    2016 - 2020

Skills

Python
SQL
R
Go
Data
Analysis
Machine
Learning
Deep
Learning
Cloud
Computing

Pandas

Scikit-Learn

PyTorch

Flask

Docker

GCP

AWS

Git