Senior Software Engineer · AI Systems Architect · Inventor · Builder
Hi, I'm Sahil Malik
I build ML inference platforms, agent AI systems, and creative tools at Amazon. Co-inventor of a granted US patent on ML inference architecture. Creator of vaani, a Hindi programming language.
Featured Work
View all →ML Inference Platform
Production ML Serving at Scale
Founding member of an ML inference platform team at Alexa AI. Built a production Tier-1 service hosting 30+ deep learning models serving all Alexa voice traffic, with zero production defects during a major infrastructure migration.
Ambiguity Detection and Resolution in VoiceAI
Ambiguity Detection for Voice AI
Designed and launched an ambiguity detection and resolution system for Alexa, enabling millions of customers monthly to resolve ambiguous voice requests with significantly reduced error rates.
US Patent 12,494,194 B1
Incremental Asynchronous ML Inference
Granted patent for a novel architecture using neural network subgraphs for improved responsiveness in speech and NLU systems. 20 claims, active until 2044.
Velo
Stateful Stream Processing Library
Python library with a Rust core (tokio, crossbeam, PyO3) for stateful stream processing without infrastructure overhead. ~350μs stream startup, scale-to-zero workers, async generator API. Based on the stream functions concept from arXiv:2603.03089.
Recent Posts
View all →AI Agents as Universal Task Solvers: When Verification Changes Everything
A new theoretical framework shows how AI agents can become universal problem solvers — but only in domains where solutions can be quickly verified. Here's what that means.
Multi-Agent LLM Systems: What the Research Actually Shows
Synthesizing findings from four recent papers on multi-agent LLM systems. The takeaway: more agents rarely means better results. Distributed systems theory, failure taxonomies, and Amdahl's Law explain why — and point to when multi-agent architectures actually make sense.
Stateful Stream Processing Without the Infrastructure Overhead
Velo is a Python library with a Rust core for processing short-lived stateful streams. It fills the gap between stateless serverless functions and always-on stream engines like Flink — based on the stream functions concept from arXiv:2603.03089.
Let's connect
I'm always interested in discussing ML systems, agent architectures, and creative engineering challenges.