About Me

I am an aspiring data scientist who enjoys connecting the dots: be it ideas from different disciplines, people from different teams, or applications from different industries. I have strong technical skills and an academic background in computers, statistics, and machine learning. Currently pursuing my Master's in Data Science at UC San Diego, I combine academic excellence with hands-on experience in developing innovative solutions using cutting-edge technologies.
Education

Master's in Data Science

B.S. Cognitive Science w Machine Learning Spec & Computer Science
Data Structures and OO Design, Supervised Machine Learning, AI Algorithms, Neural Networks and Deep Learning, Signal Processing
Experience

Intern, Machine Learning Engineer (Incoming)
Tesla • Palo Alto, CA • Sept. 2025 - Present
- Designing ML pipeline for Anomaly Detection on energy data with Isolation Forest.

Computer Vision Researcher
Machine Learning Perception and Computation Lab • San Diego, CA • May 2025 - Present
- Developing a Vision–Language model for counting guided image generation using a YOLOCount and Stable Diffusion backbone, trained on LVIS and COCO (164K+ images) datasets
- Designing a multimodal encoder for a layout-image-text input with an LLM backbone, focusing on using layout to aid with precise and position based image generation compared to current SOTA models

Software/Data Engineering Intern
Ivanti • San Jose • June 2024 - August 2024
- Developed GenAI-driven automation tester for Neurons AI (C#, JSON, Excel, SQL)
- Built an NLP-based Dynamic Expression Generator using Azure GPT-4, RAG, and ReACT

Instructional Assistant
Cognitive and Data Science Dept, UCSD • San Diego • March 2024 - Present
- Tutored 900+ students in Python and data science under Prof. Voytek and Prof. Ellis
- Led discussions, grading, and held office hours

VP of Finance
CSForEach Club, UCSD • San Diego • September 2022 - June 2024
- Managed club budget and sponsor relationships
- Supported K–12 outreach in CS, data science, and AI

Software Engineering Intern
HCLTech • Remote • June 2023 - September 2023
- Worked with product engineering team analyzing security and anomaly datasets
- Performed EDA and data visualization to extract insights
Projects
AutoBots: RL-Based Autonomous Driving
Reinforcement Learning, CARLA, DonkeySim
- Trained self-driving agents using Reinforcement Learning (RL) on two platforms: CARLA (urban navigation) and DonkeySim (track-based circuits)
- Used LiDAR as the primary sensor for robust, light-invariant spatial perception; processed point cloud into 18D state vectors combining sector-wise distances and waypoint metadata
- Implemented Proximal Policy Optimization (PPO) and Actor-Critic methods for continuous action control (steering, throttle, brake), using custom neural networks for actor-critic separation
- Designed and iteratively refined reward functions to optimize navigation performance, emphasizing collision avoidance, lane following, waypoint tracking, and speed regulation
- Results evaluated on metrics like cumulative reward, episode length, distance traveled, and collision rate
- Explored implications for industrial logistics, real-world deployment, and ethical considerations such as safety, bias, transparency, and sim-to-real transfer
YTsum (Multimodal YouTube Summarizer)
Python, Whisper, OpenCV, CLIP-style embeddings, Streamlit
- Multimodal summarizer combining Whisper (ASR), keyframe extraction, and vision-text embeddings to generate structured video summaries
- Extracts captions + salient frames; ranks visuals, then synthesizes concise sections
- CLI + (optional) Streamlit demo for URL input and side-by-side outputs
- Modular pipeline for swapping models and evaluation prompts
Energy Telemetry Pipeline
Spark, Airflow, Python, Docker, Parquet
- Built a Spark + Airflow telemetry lakehouse (Bronze/Silver/Gold) with data-quality metrics and idempotent partitioned writes; produced Gold KPIs and model-ready feature tables
- Batch + Structured Streaming ingestion with windowed aggregations for device events
- Airflow DAGs for reliable backfills, retries, and lineage; containerized for reproducibility
- Exports curated features for downstream ML and BI dashboards
Comparative Analysis of Modern Object Detection Algorithms
Computer Vision, Deep Learning
- Compared RCNN, YOLO-World, and GroundingDINO for object detection on a curated dataset
- Evaluated bounding box accuracy using Intersection over Union (IoU), precision, recall, and F1 score
- Constructed a Robustness Evaluation Dataset to assess model generalization with complex scenes, small/occluded objects, symbolic text, and abstract prompts
- YOLO-World showed strong recall but was highly sensitive to confidence thresholds and prompt phrasing, while GroundingDINO achieved the highest IoU and F1 across conditions, excelling in open-vocabulary tasks
sEMG Gesture-Controlled Toy Car
Signal Processing, Machine Learning
- Developed a wearable system using surface electromyography (sEMG) to control a toy car with hand gestures—no remotes or screen interfaces required
- Recorded muscle activity via a Mindrove sEMG sensor and processed signals in real time through a five-stage pipeline: acquisition, filtering, feature extraction, classification, and command mapping
- Used a 4-channel 500 Hz signal, applied a 4th-order Butterworth bandpass filter (10–200 Hz), and a notch filter at 60 Hz to clean noise and preserve critical signal features
- Extracted time/frequency domain features (e.g., MAV, FFT) and classified gestures using K-Nearest Neighbors (KNN), achieving ~98% accuracy on three gestures (rest, clench, Spiderman)
Comparative ML Model Analysis
Machine Learning
- Compared ANN, XGBoost, SVM, and Random Forest on multiple datasets
- Used SMOTE, encoding, and hyperparameter tuning for evaluation
NLP Sentiment Analysis: BERT vs. SVM
Natural Language Processing
- Implemented sentiment classifiers in TensorFlow
- Authored a 5,000-word paper comparing model performance
Loan Default Risk Prediction using Machine Learning
Supervised Classification | Financial Modeling | Fairness in AI
- Analyzed over 2.2 million LendingClub loan applications (2007–2018) to predict loan default risk using financial indicators like FICO score, debt-to-income (DTI) ratio, and loan grade
- Performed extensive EDA and feature engineering to extract impactful predictors; found FICO score, DTI, and loan grade to be the most significant indicators of default risk
- Implemented and compared SVM, K-Nearest Neighbors (KNN), and Random Forest models:
- Random Forest yielded the highest F1-score and recall for default detection
- KNN achieved the best precision
- SVM had the highest accuracy (80%) but failed to capture positive (default) cases
- Addressed dataset imbalance and ethical concerns by carefully filtering non-representative labels and excluding sensitive demographic attributes
- Applied fairness-aware methods and advocated for transparent, inclusive financial risk models with privacy-conscious practices
- Tools: pandas, scikit-learn, matplotlib, seaborn, GridSearchCV, StandardScaler, FunctionTransformer
🛠 Technical Skills
Programming & Languages
Libraries & Frameworks
Tools & Platforms
Core Competencies
Contact
You can reach me at a7anand-at-ucsd-dot-edu