Abhay Anand

Data Science & Machine Learning Student

About Me

Abhay Anand

I am an aspiring data scientist who enjoys connecting the dots: be it ideas from different disciplines, people from different teams, or applications from different industries. I have strong technical skills and an academic background in computers, statistics, and machine learning. Currently pursuing my Master's in Data Science at UC San Diego, I combine academic excellence with hands-on experience in developing innovative solutions using cutting-edge technologies.

📚 Education

Master's in Data Science

University of California, San Diego
August 2025 - June 2027
Incoming

B.S. in Cognitive Science

University of California, San Diego
August 2022 - August 2025
Machine Learning Specialization with Minor in Computer Science
Relevant Coursework:
Data Structures and OO Design, Supervised Machine Learning, AI Algorithms, Neural Networks and Deep Learning, Signal Processing

Experience

Computer Vision Researcher

mlPC Lab UC San Diego • San Diego • May 2025 - Present

  • Researching diffusion models, specifically exploring their applications in generative modeling, under the supervision of Professor Zhuowen Tu

Software/Data Engineering Intern

Ivanti • San Jose • June 2024 - August 2024

  • Developed GenAI-driven automation tester for Neurons AI (C#, JSON, Excel, SQL)
  • Built an NLP-based Dynamic Expression Generator using Azure GPT-4, RAG, and ReACT

Instructional Assistant

Cognitive and Data Science Dept, UCSD • San Diego • March 2024 - Present

  • Tutored 900+ students in Python and data science under Prof. Voytek and Prof. Ellis
  • Led discussions, grading, and held office hours

VP of Finance

CSForEach Club, UCSD • San Diego • September 2022 - June 2024

  • Managed club budget and sponsor relationships
  • Supported K–12 outreach in CS, data science, and AI

Software Engineering Intern

HCLTech • Remote • June 2023 - September 2023

  • Worked with product engineering team analyzing security and anomaly datasets
  • Performed EDA and data visualization to extract insights

Projects

AutoBots: RL-Based Autonomous Driving

Reinforcement Learning, CARLA, DonkeySim • January 2025 - Present

  • Trained self-driving agents using Reinforcement Learning (RL) on two platforms: CARLA (urban navigation) and DonkeySim (track-based circuits)
  • Used LiDAR as the primary sensor for robust, light-invariant spatial perception; processed point cloud into 18D state vectors combining sector-wise distances and waypoint metadata
  • Implemented Proximal Policy Optimization (PPO) and Actor-Critic methods for continuous action control (steering, throttle, brake), using custom neural networks for actor-critic separation
  • Designed and iteratively refined reward functions to optimize navigation performance, emphasizing collision avoidance, lane following, waypoint tracking, and speed regulation
  • Results evaluated on metrics like cumulative reward, episode length, distance traveled, and collision rate
  • Explored implications for industrial logistics, real-world deployment, and ethical considerations such as safety, bias, transparency, and sim-to-real transfer

Comparative Analysis of Modern Object Detection Algorithms

Computer Vision, Deep Learning • March 2025

  • Compared RCNN, YOLO-World, and GroundingDINO for object detection on a curated dataset
  • Evaluated bounding box accuracy using Intersection over Union (IoU), precision, recall, and F1 score
  • Constructed a Robustness Evaluation Dataset to assess model generalization with complex scenes, small/occluded objects, symbolic text, and abstract prompts
  • YOLO-World showed strong recall but was highly sensitive to confidence thresholds and prompt phrasing, while GroundingDINO achieved the highest IoU and F1 across conditions, excelling in open-vocabulary tasks

sEMG Gesture-Controlled Toy Car

Signal Processing, Machine Learning • March 2024 - Present

  • Developed a wearable system using surface electromyography (sEMG) to control a toy car with hand gestures—no remotes or screen interfaces required
  • Recorded muscle activity via a Mindrove sEMG sensor and processed signals in real time through a five-stage pipeline: acquisition, filtering, feature extraction, classification, and command mapping
  • Used a 4-channel 500 Hz signal, applied a 4th-order Butterworth bandpass filter (10–200 Hz), and a notch filter at 60 Hz to clean noise and preserve critical signal features
  • Extracted time/frequency domain features (e.g., MAV, FFT) and classified gestures using K-Nearest Neighbors (KNN), achieving ~98% accuracy on three gestures (rest, clench, Spiderman)

Comparative ML Model Analysis

Machine Learning • October 2024 - January 2025

  • Compared ANN, XGBoost, SVM, and Random Forest on multiple datasets
  • Used SMOTE, encoding, and hyperparameter tuning for evaluation

NLP Sentiment Analysis: BERT vs. SVM

Natural Language Processing • August 2022 - January 2023

  • Implemented sentiment classifiers in TensorFlow
  • Authored a 5,000-word paper comparing model performance

Loan Default Risk Prediction using Machine Learning

Supervised Classification | Financial Modeling | Fairness in AI • December 2024

  • Analyzed over 2.2 million LendingClub loan applications (2007–2018) to predict loan default risk using financial indicators like FICO score, debt-to-income (DTI) ratio, and loan grade
  • Performed extensive EDA and feature engineering to extract impactful predictors; found FICO score, DTI, and loan grade to be the most significant indicators of default risk
  • Implemented and compared SVM, K-Nearest Neighbors (KNN), and Random Forest models:
    • Random Forest yielded the highest F1-score and recall for default detection
    • KNN achieved the best precision
    • SVM had the highest accuracy (80%) but failed to capture positive (default) cases
  • Addressed dataset imbalance and ethical concerns by carefully filtering non-representative labels and excluding sensitive demographic attributes
  • Applied fairness-aware methods and advocated for transparent, inclusive financial risk models with privacy-conscious practices
  • Tools: pandas, scikit-learn, matplotlib, seaborn, GridSearchCV, StandardScaler, FunctionTransformer

🛠 Technical Skills

Programming & Languages

Python
Java
C#
SQL
JavaScript
HTML/CSS

Libraries & Frameworks

NumPy
Pandas
scikit-learn
TensorFlow
Keras
Matplotlib
Seaborn
OpenCV
XGBoost
PyTorch

Tools & Platforms

Git
VS Code
Azure
CARLA
DonkeySim
Tableau
Jupyter
MS Office
JSON

Core Competencies

Machine Learning
Signal & Image Processing
Data Engineering
Prompt Engineering
Object Detection

Contact