Portfolio
A selection of production-focused AI and data engineering projects, including LLM evaluation frameworks, RAG systems, Snowflake/dbt pipelines, and multi-cloud AI architecture. Full source on GitHub.
Snowflake AI Evaluation
AI demos always look good — but how do you know if the agent's answers are actually correct? Without systematic evaluation, quality is invisible and regressions go undetected.
Built a reusable AI evaluation framework that compares multiple agents against a golden test suite, stores evaluation results in Snowflake, and exposes quality metrics through a Streamlit dashboard. In the sample run, GPT-4o scored 9/10 and Gemini 2.5 Flash scored 10/10, with failures traceable to specific test cases.
click to enlarge
AI demos can look good, but without systematic evaluation, teams cannot measure quality, regressions, hallucination risk, or model changes over time. This framework makes agent quality measurable and reproducible.
Multi-Cloud Serverless RAG
The local RAG pipeline was tied to a single machine and one AI provider — no way to compare AWS, Azure, and GCP AI stacks on the same workload.
One RAG system deployable on any major cloud with a single terraform apply — live on Hugging Face Spaces with one page per cloud backend.
click to enlarge
Enterprise AI teams rarely operate on a single cloud — they inherit existing infrastructure, face vendor lock-in decisions, or need to compare AI stack costs across providers. Seeing the same pipeline built three ways makes those trade-offs concrete.
RAG Pipeline
500 arXiv research papers were unsearchable via keyword search — a paper on 'reducing compute for LLMs' never surfaces when you search 'efficient LLM training', even though it's exactly what you need.
A chat interface that answers natural-language questions grounded in the paper collection, with cited sources — deployed locally via Docker.
LLMs answer from training data that can be outdated, hallucinated, or simply wrong for your domain. RAG grounds every answer in your own documents, making responses verifiable and traceable to the source.