Abhay Anand | Data Science & Machine Learning Portfolio

About Me

I am an aspiring data scientist who enjoys connecting the dots: be it ideas from different disciplines, people from different teams, or applications from different industries. I have strong technical skills and an academic background in computers, statistics, and machine learning. Currently pursuing my Master's in Data Science at UC San Diego, I combine academic excellence with hands-on experience in developing innovative solutions using cutting-edge technologies.

📚 Education

Master's in Data Science

University of California, San Diego

August 2025 - June 2027

Incoming

B.S. in Cognitive Science

University of California, San Diego

August 2022 - August 2025

Machine Learning Specialization with Minor in Computer Science

Relevant Coursework:
Data Structures and OO Design, Supervised Machine Learning, AI Algorithms, Neural Networks and Deep Learning, Signal Processing

Experience

Computer Vision Researcher

mlPC Lab UC San Diego • San Diego • May 2025 - Present

Researching diffusion models, specifically exploring their applications in generative modeling, under the supervision of Professor Zhuowen Tu

Software/Data Engineering Intern

Ivanti • San Jose • June 2024 - August 2024

Developed GenAI-driven automation tester for Neurons AI (C#, JSON, Excel, SQL)
Built an NLP-based Dynamic Expression Generator using Azure GPT-4, RAG, and ReACT

Instructional Assistant

Cognitive and Data Science Dept, UCSD • San Diego • March 2024 - Present

Tutored 900+ students in Python and data science under Prof. Voytek and Prof. Ellis
Led discussions, grading, and held office hours

VP of Finance

CSForEach Club, UCSD • San Diego • September 2022 - June 2024

Managed club budget and sponsor relationships
Supported K–12 outreach in CS, data science, and AI

Software Engineering Intern

HCLTech • Remote • June 2023 - September 2023

Worked with product engineering team analyzing security and anomaly datasets
Performed EDA and data visualization to extract insights

Projects

AutoBots: RL-Based Autonomous Driving

Reinforcement Learning, CARLA, DonkeySim • January 2025 - Present

Trained self-driving agents using Reinforcement Learning (RL) on two platforms: CARLA (urban navigation) and DonkeySim (track-based circuits)
Used LiDAR as the primary sensor for robust, light-invariant spatial perception; processed point cloud into 18D state vectors combining sector-wise distances and waypoint metadata
Implemented Proximal Policy Optimization (PPO) and Actor-Critic methods for continuous action control (steering, throttle, brake), using custom neural networks for actor-critic separation
Designed and iteratively refined reward functions to optimize navigation performance, emphasizing collision avoidance, lane following, waypoint tracking, and speed regulation
Results evaluated on metrics like cumulative reward, episode length, distance traveled, and collision rate
Explored implications for industrial logistics, real-world deployment, and ethical considerations such as safety, bias, transparency, and sim-to-real transfer

Comparative Analysis of Modern Object Detection Algorithms

Computer Vision, Deep Learning • March 2025

Compared RCNN, YOLO-World, and GroundingDINO for object detection on a curated dataset
Evaluated bounding box accuracy using Intersection over Union (IoU), precision, recall, and F1 score
Constructed a Robustness Evaluation Dataset to assess model generalization with complex scenes, small/occluded objects, symbolic text, and abstract prompts
YOLO-World showed strong recall but was highly sensitive to confidence thresholds and prompt phrasing, while GroundingDINO achieved the highest IoU and F1 across conditions, excelling in open-vocabulary tasks

sEMG Gesture-Controlled Toy Car

Signal Processing, Machine Learning • March 2024 - Present

Developed a wearable system using surface electromyography (sEMG) to control a toy car with hand gestures—no remotes or screen interfaces required
Recorded muscle activity via a Mindrove sEMG sensor and processed signals in real time through a five-stage pipeline: acquisition, filtering, feature extraction, classification, and command mapping
Used a 4-channel 500 Hz signal, applied a 4th-order Butterworth bandpass filter (10–200 Hz), and a notch filter at 60 Hz to clean noise and preserve critical signal features
Extracted time/frequency domain features (e.g., MAV, FFT) and classified gestures using K-Nearest Neighbors (KNN), achieving ~98% accuracy on three gestures (rest, clench, Spiderman)

Comparative ML Model Analysis

Machine Learning • October 2024 - January 2025

Compared ANN, XGBoost, SVM, and Random Forest on multiple datasets
Used SMOTE, encoding, and hyperparameter tuning for evaluation

NLP Sentiment Analysis: BERT vs. SVM

Natural Language Processing • August 2022 - January 2023

Implemented sentiment classifiers in TensorFlow
Authored a 5,000-word paper comparing model performance

Loan Default Risk Prediction using Machine Learning

Supervised Classification | Financial Modeling | Fairness in AI • December 2024

Analyzed over 2.2 million LendingClub loan applications (2007–2018) to predict loan default risk using financial indicators like FICO score, debt-to-income (DTI) ratio, and loan grade
Performed extensive EDA and feature engineering to extract impactful predictors; found FICO score, DTI, and loan grade to be the most significant indicators of default risk
Implemented and compared SVM, K-Nearest Neighbors (KNN), and Random Forest models:
- Random Forest yielded the highest F1-score and recall for default detection
- KNN achieved the best precision
- SVM had the highest accuracy (80%) but failed to capture positive (default) cases
Addressed dataset imbalance and ethical concerns by carefully filtering non-representative labels and excluding sensitive demographic attributes
Applied fairness-aware methods and advocated for transparent, inclusive financial risk models with privacy-conscious practices
Tools: pandas, scikit-learn, matplotlib, seaborn, GridSearchCV, StandardScaler, FunctionTransformer

🛠 Technical Skills

Programming & Languages

Python

Java

SQL

JavaScript

HTML/CSS

Libraries & Frameworks

NumPy

Pandas

scikit-learn

TensorFlow

Keras

Matplotlib

Seaborn

OpenCV

XGBoost

PyTorch

Tools & Platforms

Git

VS Code

Azure

CARLA

DonkeySim

Tableau

Jupyter

MS Office

JSON

Core Competencies

Machine Learning

Signal & Image Processing

Data Engineering

Prompt Engineering

Object Detection

Contact

You can reach me at a7anand-at-ucsd-dot-edu