Abhay Anand

Data Science & Machine Learning Student

About Me

Abhay Anand

I am an aspiring data scientist who enjoys connecting the dots: be it ideas from different disciplines, people from different teams, or applications from different industries. I have strong technical skills and an academic background in computers, statistics, and machine learning. Currently pursuing my Master's in Data Science at UC San Diego, I combine academic excellence with hands-on experience in developing innovative solutions using cutting-edge technologies.

Education

Master's in Data Science

University of California, San Diego
August 2025 - June 2027
Incoming

B.S. Cognitive Science w Machine Learning Spec & Computer Science

University of California, San Diego
August 2022 - August 2025
Relevant Coursework:
Data Structures and OO Design, Supervised Machine Learning, AI Algorithms, Neural Networks and Deep Learning, Signal Processing

Experience

Intern, Machine Learning Engineer

Tesla • Palo Alto, CA • Sept. 2025 - Present

  • Designing an ML pipeline with Isolation Forest for fleet-wide anomaly detection on Megapacks with TensorFlow to detect thermal issues before they happen
  • Building Airflow DAGs running Spark ETL from S3 with AWS Glue to ingest telemetry data. Using feature engineering (seasonality coefficients, ordinal encodings) for real time anomaly detection.

Computer Vision Researcher

Machine Learning Perception and Computation Lab • San Diego, CA • May 2025 - Present

  • Developing a Vision–Language model for counting guided image generation using a YOLOCount and Stable Diffusion backbone, trained on LVIS and COCO (164K+ images) datasets
  • Designing a multimodal encoder for a layout-image-text input with an LLM backbone, focusing on using layout to aid with precise and position based image generation compared to current SOTA models

Software/Data Engineering Intern

Ivanti • San Jose • June 2024 - August 2024

  • Developed GenAI-driven automation tester for Neurons AI (C#, JSON, Excel, SQL)
  • Built an NLP-based Dynamic Expression Generator using Azure GPT-4, RAG, and ReACT

Instructional Assistant

Cognitive and Data Science Dept, UCSD • San Diego • March 2024 - Present

  • Tutored 900+ students in Python and data science under Prof. Voytek and Prof. Ellis
  • Led discussions, grading, and held office hours

VP of Finance

CSForEach Club, UCSD • San Diego • September 2022 - June 2024

  • Managed club budget and sponsor relationships
  • Supported K–12 outreach in CS, data science, and AI

Software Engineering Intern

HCLTech • Remote • June 2023 - September 2023

  • Worked with product engineering team analyzing security and anomaly datasets
  • Performed EDA and data visualization to extract insights

Projects

AutoBots: RL-Based Autonomous Driving

Reinforcement Learning, CARLA, DonkeySim

  • Trained self-driving agents using Reinforcement Learning (RL) on two platforms: CARLA (urban navigation) and DonkeySim (track-based circuits)
  • Used LiDAR as the primary sensor for robust, light-invariant spatial perception; processed point cloud into 18D state vectors combining sector-wise distances and waypoint metadata
  • Implemented Proximal Policy Optimization (PPO) and Actor-Critic methods for continuous action control (steering, throttle, brake), using custom neural networks for actor-critic separation
  • Designed and iteratively refined reward functions to optimize navigation performance, emphasizing collision avoidance, lane following, waypoint tracking, and speed regulation
  • Results evaluated on metrics like cumulative reward, episode length, distance traveled, and collision rate
  • Explored implications for industrial logistics, real-world deployment, and ethical considerations such as safety, bias, transparency, and sim-to-real transfer

YTsum (Multimodal YouTube Summarizer)

Python, Whisper, OpenCV, CLIP-style embeddings, Streamlit

  • Multimodal summarizer combining Whisper (ASR), keyframe extraction, and vision-text embeddings to generate structured video summaries
  • Extracts captions + salient frames; ranks visuals, then synthesizes concise sections
  • CLI + (optional) Streamlit demo for URL input and side-by-side outputs
  • Modular pipeline for swapping models and evaluation prompts

Energy Telemetry Pipeline

Spark, Airflow, Python, Docker, Parquet

  • Built a Spark + Airflow telemetry lakehouse (Bronze/Silver/Gold) with data-quality metrics and idempotent partitioned writes; produced Gold KPIs and model-ready feature tables
  • Batch + Structured Streaming ingestion with windowed aggregations for device events
  • Airflow DAGs for reliable backfills, retries, and lineage; containerized for reproducibility
  • Exports curated features for downstream ML and BI dashboards

Comparative Analysis of Modern Object Detection Algorithms

Computer Vision, Deep Learning

  • Compared RCNN, YOLO-World, and GroundingDINO for object detection on a curated dataset
  • Evaluated bounding box accuracy using Intersection over Union (IoU), precision, recall, and F1 score
  • Constructed a Robustness Evaluation Dataset to assess model generalization with complex scenes, small/occluded objects, symbolic text, and abstract prompts
  • YOLO-World showed strong recall but was highly sensitive to confidence thresholds and prompt phrasing, while GroundingDINO achieved the highest IoU and F1 across conditions, excelling in open-vocabulary tasks

sEMG Gesture-Controlled Toy Car

Signal Processing, Machine Learning

  • Developed a wearable system using surface electromyography (sEMG) to control a toy car with hand gestures—no remotes or screen interfaces required
  • Recorded muscle activity via a Mindrove sEMG sensor and processed signals in real time through a five-stage pipeline: acquisition, filtering, feature extraction, classification, and command mapping
  • Used a 4-channel 500 Hz signal, applied a 4th-order Butterworth bandpass filter (10–200 Hz), and a notch filter at 60 Hz to clean noise and preserve critical signal features
  • Extracted time/frequency domain features (e.g., MAV, FFT) and classified gestures using K-Nearest Neighbors (KNN), achieving ~98% accuracy on three gestures (rest, clench, Spiderman)

Comparative ML Model Analysis

Machine Learning

  • Compared ANN, XGBoost, SVM, and Random Forest on multiple datasets
  • Used SMOTE, encoding, and hyperparameter tuning for evaluation

NLP Sentiment Analysis: BERT vs. SVM

Natural Language Processing

  • Implemented sentiment classifiers in TensorFlow
  • Authored a 5,000-word paper comparing model performance

Loan Default Risk Prediction using Machine Learning

Supervised Classification | Financial Modeling | Fairness in AI

  • Analyzed over 2.2 million LendingClub loan applications (2007–2018) to predict loan default risk using financial indicators like FICO score, debt-to-income (DTI) ratio, and loan grade
  • Performed extensive EDA and feature engineering to extract impactful predictors; found FICO score, DTI, and loan grade to be the most significant indicators of default risk
  • Implemented and compared SVM, K-Nearest Neighbors (KNN), and Random Forest models:
    • Random Forest yielded the highest F1-score and recall for default detection
    • KNN achieved the best precision
    • SVM had the highest accuracy (80%) but failed to capture positive (default) cases
  • Addressed dataset imbalance and ethical concerns by carefully filtering non-representative labels and excluding sensitive demographic attributes
  • Applied fairness-aware methods and advocated for transparent, inclusive financial risk models with privacy-conscious practices
  • Tools: pandas, scikit-learn, matplotlib, seaborn, GridSearchCV, StandardScaler, FunctionTransformer

🛠 Technical Skills

Programming & Languages

Python
Java
C#
SQL
JavaScript
HTML/CSS

Libraries & Frameworks

NumPy
Pandas
scikit-learn
TensorFlow
Keras
Matplotlib
Seaborn
OpenCV
XGBoost
PyTorch
Spark
Airflow
Whisper
CLIP

Tools & Platforms

Git
VS Code
Azure
CARLA
DonkeySim
Tableau
Jupyter
MS Office
JSON
Docker
Streamlit

Core Competencies

Machine Learning
Signal & Image Processing
Data Engineering
Prompt Engineering
Object Detection
ETL / ELT Pipelines
Multimodal AI

Contact

You can reach me at a7anand-at-ucsd-dot-edu