Skip to main content

Command Palette

Search for a command to run...

PDFpedia
CompletedTypeScriptReactNext.js+4 more

PDFpedia

Open-source chat with documents using Retrieval Augmented Generation (RAG) and powerful open-source LLMs

Timeline

2024

Role

Full Stack

Team

Solo

Status
Completed

Technology Stack

TypeScript
React
Next.js
Voy
Ollama
LangChain.js
Transformers.js

Key Challenges

  • RAG Implementation
  • Local LLM Integration
  • Browser-based Embeddings

Key Learnings

  • Retrieval Augmented Generation
  • Vector Stores
  • Ollama Integration

PDFpedia: Chat with Documents

Overview

PDF-pedia is an exciting open-source chat with documents project using Retrieval Augmented Generation (RAG) implementation. It leverages powerful open-source Large Language Models and a modern tech stack to provide a user-friendly application for interacting with PDF documents.

What is PDFpedia?

PDF-pedia simplifies the way you interact with PDFs. It uses open-source Large Language Models and can run entirely on your local machine. PDF-pedia harnesses the power of an open-source large language model—it can run entirely locally, making it a privacy-focused and efficient solution for PDF document management.

Key Features

  • Open Source LLMs: PDF-pedia leverages open-source Large Language Models, ensuring accessibility and transparency
  • Local Execution: Run PDF-pedia on your own machine, putting you in control of your data and privacy
  • Simplified PDF Interaction: Chat with your documents, extract information, and generate answers as if you're having a conversation
  • Offline Support: Go offline after the site loads—all processing happens client-side

Tech Stack

  • Voy: Vector store for storing and managing vectors, fully WebAssembly (WASM) compatible for browser execution
  • Ollama: Runs Language Models locally and exposes them to the web app
  • LangChain.js: Orchestrates all pieces—calling models, performing retrieval, ensuring smooth operation
  • Transformers.js: Runs embeddings in the browser for accurate PDF processing
  • Next.js: Application framework

How It Works

  1. Upload PDF documents to the application
  2. Content is extracted and processed client-side
  3. Embeddings are generated using Transformers.js in the browser
  4. Voy stores vectors for efficient retrieval
  5. Ollama provides the local LLM for generating responses
  6. LangChain.js orchestrates the RAG pipeline

Why I Built This

I built PDFpedia to create a privacy-focused alternative to cloud-based document chat tools. By running entirely locally, users can interact with sensitive documents without sending data to external servers. The combination of modern web technologies and open-source LLMs makes powerful document AI accessible to everyone.

Credits