Tejas Garg
Building ML Systems That Work Beyond Demos
Pre-final year CSE (AI/ML) • Available immediately
Building ML Systems That Work Beyond Demos
Pre-final year CSE (AI/ML) • Available immediately
Third-year CS undergrad focused on building ML systems that work beyond demos. I care about bridging the gap between novel ML research and production-ready systems.
Recent work: reproduced a diffusion classifier from scratch and built custom XAI for it, created a real-time PPE monitoring pipeline with temporal filtering, and experimented with GRPO to teach LLMs explicit reasoning. I like projects where the engineering is as hard as the ML.
B.Tech in Computer Science & Engineering
Specialization in AI & Machine Learning
Indian Institute of Information Technology, Nagpur
Expected 2027
Reproduced a diffusion-based diabetic retinopathy classifier from scratch with dual-granularity conditional guidance: EfficientNet-B0 + EfficientSAM for global features, ResNet-18 with gated attention on 6 ROI patches for local. DDIM scheduler, 1000 train / 10 test timesteps. Achieved 84.1% on APTOS 2019. Built a dedicated XAI layer with six explainers: attention saliency, diffusion trajectory, spatio-temporal shifts, conditional attribution, faithfulness validator, and counterfactuals.
RL environment for emergency medical dispatch — a 20-node city POMDP where an agent manages 6 units across cardiac, trauma, and fire emergencies. Features radio delays, hidden severity, ghost calls, city events (bridge collapse, heatwave), and an adversarial curriculum that auto-escalates difficulty. GRPO-trained on Qwen3-4B. Meta OpenEnv Hackathon project with team TorchBearers.
Multi-agent research assistant built with LangGraph subgraphs for discovery, analysis, and survey generation. Four human-in-the-loop checkpoints let users steer paper curation and request targeted section revisions. Orchestrates Semantic Scholar, arXiv, and Firecrawl APIs. Exports Markdown surveys.
Real-time sales call intelligence pipeline: faster-whisper transcription → pyannote speaker diarization → LLM role identification → analysis (objection detection, action items, call scoring, sentiment timeline) → FAISS RAG Q&A with timestamp citations. WebSocket streaming for live transcripts. FastAPI + Next.js.
LLMOps platform for prompt versioning and rigorous evaluation. Git-like versioning with diffs, dev/staging/production tags, and rollback. LLM-as-judge metrics (faithfulness, answer relevance, context precision/recall) plus latency and cost tracking. A/B testing with paired t-test statistical significance. Async Celery pipelines.
Full-stack RAG CS interview prep with hybrid BM25 + dense + RRF retrieval, optional cross-encoder reranking, and intent-aware scoring. LLM grading (0–5) feeds into SM-2 spaced repetition and prerequisite-aware learning paths combining topological ordering with SWOT/mastery signals. Offline generation pipeline reduced malformed output from ~12% to <1%.
Event-driven real-time PPE compliance monitoring. Hybrid detection: YOLOv8 person tracking + SAM3 + YOLOv11 (12-class PPE detector). Temporal stability via EMA confidence fusion and hysteresis thresholds. Decoupled display/process FPS with frame dropping for predictable latency under load. FastAPI + Next.js.
Fine-tuned Mistral-7B-Instruct-v0.3 on GSM8K math reasoning. SFT warmup (4k samples) → GRPO with XML-structured reasoning traces. Zero-shot 41.2% → SFT 44.5% → GRPO 52.5% (+8% absolute over SFT). Key finding: evaluation consistency matters as much as the training algorithm itself.
Open to internship opportunities