Ehtesham Siddiqui.

I build ML systems that turn raw data into decisions, from training CNN and LLM pipelines at Charles Schwab to publishing IEEE research accruing 22 citations and filing a patent. MS Computer Engineering at NYU, graduating May 2026.

01. About

About Me

Ehtesham Siddiqui
0Citations
0Model Accuracy %
0Accuracy Gain %

My path into Data Science started with a simple question: why do models fail in production? What began as scripting at the University of Mumbai evolved into a deep focus on end-to-end ML pipelines, LLM systems, and analytics, now sharpened at NYU.

I don't just fit models. I build ML systems that stay reliable at scale. From improving KPI prediction accuracy at Charles Schwab to training CNN models for image segmentation at Zensar, I solve problems where data quality and model integrity are non-negotiable.

When I'm not tuning hyperparameters, I'm thinking about data governance. Working across enterprise and academic environments taught me that good ML is only as valuable as the pipelines and systems that deliver it. End-to-end ownership is everything.

Languages
Python R SQL Jupyter
ML / Stats
Regression Random Forest SVM XGBoost Hypothesis Testing
Deep Learning
CNN / LSTM BERT / GPT-4 LangChain HuggingFace LLMs
NLP
NER Sentiment Analysis Topic Modeling Summarization
Cloud / AWS
S3 / Lambda Glue / Athena Kinesis Redshift
Visualization
Tableau Power BI Looker Excel
Frameworks
NumPy / Pandas Scikit-learn TensorFlow PyTorch OpenCV
Databases
PostgreSQL SQL Server MongoDB

Education

2024 – 2026
New York University
M.S. Computer Engineering  ·  GPA 3.55
2020 – 2024
University of Mumbai
B.E. Computer Engineering  ·  GPA 3.31

Leadership & Interests

Technical Secretary

Led the Student Council at University of Mumbai (2022–2023). Managed a team of 50+ students to organize technical festivals and hackathons with 1000+ attendees.

Placement Coordinator

Liaised between corporate recruiters and college administration (2022–2024), streamlining the hiring process for 200+ graduating engineers.

Chess Enthusiast @GroovySquare

Silver Medal (2nd Prize) · "Catch me blundering queens in 3-minute blitz games. It's my favorite way to reset the brain, panic under time pressure, and occasionally find a brilliant mate."

02. Experience

Where I've Worked

Charles Schwab
Data Scientist
Oct 2025 – Present
  • Optimized linear and logistic regression models for predicting key business KPIs through feature selection and model evaluation, improving prediction accuracy by 18% to support sales and marketing strategy.
  • Designed LangChain-based chatbots for internal stakeholders with advanced prompt engineering and context management, automating repetitive business queries.
  • Integrated LLM solutions into production for automated data annotation and insight generation, embedding AI-driven workflows to streamline manual processes.
  • Architected scalable ETL pipelines using AWS Glue and Lambda, integrating data from S3, RDS, and Redshift to support near real-time analytics and enterprise reporting.
  • Applied XGBoost pipelines with feature engineering, preprocessing, and model performance tracking for reliable production-ready deployments.
NYU Courant Institute
Graduate Teaching Assistant
Jan 2026 – May 2026
  • Supported instruction for Agile Software Development & DevOps (CSCI-UA-430), assisting students with course material, assignments, and project feedback.
  • Assisted in Mathematical Techniques for CS Applications (CSCI-GA.1180), reinforcing foundational concepts in linear algebra and probability core to ML and data science.
Zensar Technologies
Jr. Data Scientist
Jan 2023 – Jul 2024
  • Enhanced Random Forest models for classification and regression by refining feature selection, hyperparameter tuning, and validation strategies across large-scale structured datasets.
  • Built CNN models for image segmentation and facial recognition, improving classification accuracy by 48% over traditional ML baselines.
  • Optimized AWS Kinesis shard configurations and batching strategies to improve streaming performance, data integrity, and real-time processing scalability.
  • Established interactive Power BI dashboards providing real-time visibility into key business metrics for cross-functional teams.
  • Used PostgreSQL for complex SQL queries including joins, subqueries, aggregations, and indexing to prepare data for ML models and analytical reporting.
03. Projects

Selected Work

IntelliCast

Agentic Political Events AI chatbot with multi-step reasoning and hallucination prevention. LangChain evaluation pipelines improved answer reliability by 90% and vectorized inference for 40% throughput gains.

Python LangChain Flask LLM
NoteForge

Google Docs-style collaborative editor with CRDT-based conflict resolution achieving sub-50ms sync latency. Features Git-style version history, RBAC access control, and session management.

Next.js PostgreSQL WebSockets CRDT
Speech-Database-Query

Voice-based multilingual database query tool used by 25+ businesses. Achieved 92% accuracy across regional languages using Django and IndicNLTK NLP pipelines.

Django IndicNLTK NLP Python
04. Research & Patent

Research & Patent

IEEE Profile
Patent Application #202341053005  ·  Filed Aug 2023
Edge IoT Blockchain Framework

Filed and published patent application for an edge-blockchain IoT framework with hybrid Proof-of-Authority consensus, achieving sub-second data integrity verification and +40% throughput improvement over baseline systems.

IEEE Publication  ·  Nov 2023
Streamlining Clinical Practice Management

Developed an Android portal for virtual consultations and e-prescriptions, reducing administrative overhead by 30% and improving healthcare access across tier-2/3 cities.

What's Next?

Get In Touch

I'm currently looking for full-time Data Scientist roles in NYC starting 2026. Whether you have a question or just want to say hi, I'll try my best to get back to you!

Say Hello