Hien Phan

Home Blog Portfolio About

Blog

Building a RAG Pipeline from Scratch: How I Made Hundreds of Research Papers Searchable

April 21, 2026  •  #python, #ai, #rag, #postgresql, #pgvector, #embeddings, #streamlit, #arxiv, #claude, #openai

How I built a system that understands the meaning of your questions and finds relevant answers across hundreds of arXiv research papers — using embeddings, pgvector, and Claude.

Read more

Filter by Language

  • All Posts
  • English
  • Vietnamese

Tags

  • #ai
  • #anthropic
  • #api
  • #arxiv
  • #automation
  • #aws
  • #azure
  • #azure-ai-foundry
  • #azure-ml
  • #bedrock
  • #betfair
  • #claude
  • #cloud-functions
  • #embeddings
  • #evaluation
  • #firestore
  • #gcp
  • #glue
  • #hugging-face
  • #lambda
  • #langgraph
  • #llm
  • #meta
  • #openai
  • #opensearch
  • #pgvector
  • #postgresql
  • #python
  • #rag
  • #serverless
  • #snowflake
  • #sports-data
  • #streamlit
  • #terraform
  • #vertex-ai

Explore

  • All Posts
  • English Posts
  • Vietnamese Posts
  • Portfolio
  • About Me
  • GitHub Profile
  • [email protected]

About This Site

Personal site of Hien Phan — Lead Data Engineer, AI Engineer and Data Architect with a PhD in Computer Science. Writing on data platforms, AI evaluation, RAG systems, and cloud architecture. Occasional posts in Vietnamese.

About Me

© Hien Phan. All rights reserved. | Design: HTML5 UP