PDFpedia: Chat with Documents

Overview

PDF-pedia is an exciting open-source chat with documents project using Retrieval Augmented Generation (RAG) implementation. It leverages powerful open-source Large Language Models and a modern tech stack to provide a user-friendly application for interacting with PDF documents.

What is PDFpedia?

PDF-pedia simplifies the way you interact with PDFs. It uses open-source Large Language Models and can run entirely on your local machine. PDF-pedia harnesses the power of an open-source large language model—it can run entirely locally, making it a privacy-focused and efficient solution for PDF document management.

Key Features

Open Source LLMs: PDF-pedia leverages open-source Large Language Models, ensuring accessibility and transparency
Local Execution: Run PDF-pedia on your own machine, putting you in control of your data and privacy
Simplified PDF Interaction: Chat with your documents, extract information, and generate answers as if you're having a conversation
Offline Support: Go offline after the site loads—all processing happens client-side

Tech Stack

Voy: Vector store for storing and managing vectors, fully WebAssembly (WASM) compatible for browser execution
Ollama: Runs Language Models locally and exposes them to the web app
LangChain.js: Orchestrates all pieces—calling models, performing retrieval, ensuring smooth operation
Transformers.js: Runs embeddings in the browser for accurate PDF processing
Next.js: Application framework

How It Works

Upload PDF documents to the application
Content is extracted and processed client-side
Embeddings are generated using Transformers.js in the browser
Voy stores vectors for efficient retrieval
Ollama provides the local LLM for generating responses
LangChain.js orchestrates the RAG pipeline

Why I Built This

I built PDFpedia to create a privacy-focused alternative to cloud-based document chat tools. By running entirely locally, users can interact with sensitive documents without sending data to external servers. The combination of modern web technologies and open-source LLMs makes powerful document AI accessible to everyone.

Credits

@dawchihliou for Voy
@jmorgan and @mchiang0610 for Ollama
@xenovacom for Transformers.js

PDFpedia

Timeline

Role

Team

Status

Technology Stack

Key Challenges

Key Learnings

PDFpedia: Chat with Documents

Overview

What is PDFpedia?

Key Features

Tech Stack

How It Works

Why I Built This

Credits

Related Projects

NotetakerAI

DrugDiscovery

Bhaskar's Portfolio Assistant