Projects

A selection of work spanning ML inference platforms, agent AI systems, creative tools, and open-source projects.

Amazon

(6)

ML Inference Platform

Production ML Serving at Scale

Founding member of an ML inference platform team at Alexa AI. Built a production Tier-1 service hosting 30+ deep learning models serving all Alexa voice traffic, with zero production defects during a major infrastructure migration.

30+ modelsTier-1 productionZero-defect migration

ML InferenceProduction SystemsPythonAWS

Ambiguity Detection and Resolution in VoiceAI

Ambiguity Detection for Voice AI

Designed and launched an ambiguity detection and resolution system for Alexa, enabling millions of customers monthly to resolve ambiguous voice requests with significantly reduced error rates.

Millions of customers/monthMajor defect rate reduction

NLUMLVoice AIPython

US Patent 12,494,194 B1 &nearr;

Incremental Asynchronous ML Inference

Granted patent for a novel architecture using neural network subgraphs for improved responsiveness in speech and NLU systems. 20 claims, active until 2044.

20 claims9 citationsActive until 2044

PatentML ArchitectureSpeech Recognition

LLM Service Architecture

Microservices Decomposition

Authored a technical proposal to decompose a large monolithic LLM service into modular microservices. Reviewed by senior leadership and drove alignment across multiple organizations.

System DesignMicroservicesLLMArchitecture

Agent AI Latency Optimization

LLM Chat Performance Engineering

Achieved significant latency reduction in an LLM-based chat product through preprocessing parallelization, speculative retrieval, and prompt caching techniques.

Agent AILatencyLLMPerformance

Alexa+ Launch Support

Technical Lead for Public Launch

Led a technical support team for the public Alexa+ announcement event, debugging blocking issues in real-time to ensure a successful launch.

LeadershipLaunchDebugging

Personal

(7)

Velo &nearr;

Stateful Stream Processing Library

Python library with a Rust core (tokio, crossbeam, PyO3) for stateful stream processing without infrastructure overhead. ~350μs stream startup, scale-to-zero workers, async generator API. Based on the stream functions concept from arXiv:2603.03089.

~350μs startupRust coreScale-to-zero

RustPythonStreamingSystemsOpen Source

Sinkhole &nearr;

Attention Sink Diagnostics for LLMs

Diagnostic tool to measure attention sinks and activation spikes in transformer LLMs. Validated across 400 prompts on Qwen2.5-7B. Includes sinkhole-vllm — a companion KV cache eviction policy for vLLM that pins sink tokens. Based on Sun, Canziani & LeCun (ICML 2026).

400-prompt evalvLLM integrationICML 2026 paper

TransformersLLMsML ResearchPythonOpen Source

CHLU &nearr;

Causal Hamiltonian Learning Unit

PyTorch implementation of the Causal Hamiltonian Learning Unit from ICLR 2026 — a physics-inspired neural network layer using relativistic Hamiltonian dynamics with symplectic integration. Drop-in replacement for LSTM/Neural ODE in temporal tasks.

3 experimentsICLR 2026 paperLSTM/ODE comparison

Physics MLPyTorchNeural NetworksOpen Source

vaani

Hindi Programming Language

A Hindi-based programming language equal to Python. Original language design bridging linguistic barriers in computing, with full parser and interpreter.

Language DesignPythonHindiCompiler

aishell

LLM-Augmented Shell in Rust

An AI-powered shell that uses LLMs to augment command-line workflows. Systems-level AI integration built in Rust.

RustLLMCLISystems

ProofOfImpact

Blockchain + AI Social Impact

Decentralized platform using blockchain and AI to power trust, verification, and incentives in philanthropy, education, and digital media.

BlockchainAITypeScriptSocial Impact

SecureConnect

Encrypted Communication Bridge

Modular, open-source system creating a secure local-first communication bridge between iPhone and Mac with encrypted, authenticated, policy-controlled communication.

SecurityPythonEncryptioniOSmacOS

DGX Research Lab

(5)

DGX Research Lab

10+ AI Services on NVIDIA DGX Spark

Production-grade multi-blueprint AI platform with 15+ ML models, ~200GB aggregate GPU VRAM, full observability. Spans LLM orchestration, multi-modal generation, voice synthesis, video analytics, and autonomous agents.

15+ models~200GB VRAM10+ services

NVIDIADGXMulti-AgentGPUProduction AI

AutonomousMe

Digital Twin Agent System

Fully autonomous personal AI agent with 3-layer perception-cognition-action architecture, memory systems, and graduated trust autonomy (levels 0–4).

Agent AIDigital TwinLangGraphVLM

PDF-to-Podcast Pipeline

Transform Documents into Audio

Converts PDFs into engaging podcast audio via LLM-powered script generation. Supports dialogue and monologue modes with multiple TTS backends.

LLMTTSPipelineFastAPI

Voice Cloner

GPT-SoVITS Multi-Lingual TTS

Zero-shot and few-shot voice cloning with 5-second voice capture, supporting Chinese, English, Japanese, Korean, and Cantonese.

Voice SynthesisGPT-SoVITSMulti-lingual

SongAgent

AI Music Generation

Conversational agent creating personalized songs with lyrics, beats (MusicGen), and vocals (Suno Bark). Fully autonomous from description to final mix.

Music AIMusicGenAgentLLM