A Streamlit-based application that allows you to chat with your PDF documents using AI. The application uses local language models and embeddings to provide intelligent responses based on the content of your uploaded PDFs.
Main application interface showing chat with PDF documents
- 📄 PDF Upload: Upload multiple PDF documents
- 🤖 AI Chat: Ask questions about your PDF content
- 🧠 Local AI: Uses Ollama with local language models (no API costs)
- 🔍 Semantic Search: Advanced document retrieval using sentence transformers
- 💬 Chat Interface: Beautiful chat UI with user and bot avatars
- 📝 Memory: Remembers conversation context
Before running this application, make sure you have:
- Python 3.8+ installed
- Ollama installed and running locally
- Required Python packages (see installation section)
- Download Ollama from https://ollama.com/download
- Install and start Ollama
- Pull the required model:
ollama pull deepseek-r1:1.5b
-
Clone or download this project to your local machine
-
Navigate to the project directory:
cd PDF_Chat -
Create a virtual environment (recommended):
python -m venv venv
-
Activate the virtual environment:
- Windows:
venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
-
API Keys: Obtain your hugging face api key and add it to the
.envfile in the project directoryHUGGINGFACEHUB_API_KEY=your_api_key
-
Install required packages:
pip install -r requirements.txt
-
Start the application:
streamlit run app.py
-
Open your browser and go to the URL shown in the terminal (usually
http://localhost:8501) -
Upload PDF documents:
- Use the sidebar to upload one or more PDF files
- Click the "Process" button to index the documents
-
Start chatting:
- Type your questions in the text input
- The AI will answer based on the content of your uploaded PDFs
High-level overview of how the application processes and responds to queries
- Document Processing: PDFs are converted to text and split into chunks
- Embedding Generation: Text chunks are converted to vector embeddings using sentence transformers
- Vector Storage: Embeddings are stored in a FAISS vector database for fast retrieval
- Question Answering: When you ask a question:
- The question is converted to an embedding
- Similar document chunks are retrieved
- The local language model generates an answer based on the retrieved context
- Frontend: Streamlit
- PDF Processing: PyPDF2
- Text Splitting: LangChain CharacterTextSplitter
- Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
- Vector Database: FAISS
- Language Model: Ollama with deepseek-r1:1.5b
- Conversation Management: LangChain ConversationalRetrievalChain
Chat_with_pdfs/
├── .env
├── app.py # Main Streamlit application
├── htmlTemplates.py # CSS styles and HTML templates
├── README.md # This file
├── images/ # Screenshots and diagrams
│ ├── app-screenshot.png
│ └── workflow-diagram.png
└── venv/ # Virtual environment (created during setup)
To use a different Ollama model, modify the get_conversation_chain function in app.py:
def get_conversation_chain(vectorstore):
llm = Ollama(model="your-preferred-model") # Change this line
# ... rest of the functionEdit htmlTemplates.py to customize:
- Chat message styling
- Avatar images
- Colors and layout
-
"Ollama model not found" error:
- Make sure Ollama is running
- Pull the required model:
ollama pull deepseek-r1:1.5b
-
Import errors:
- Ensure all packages are installed in your virtual environment
- Check that you're using the correct Python version
-
PDF processing issues:
- Ensure PDFs are not password-protected
- Check that PDFs contain extractable text
- For large PDFs, processing may take some time
- The first question after processing might be slower as the model loads
- Consider using smaller chunk sizes for faster processing
Feel free to submit issues, feature requests, or pull requests to improve this application.
This project is open source and available under the MIT License.
- Built with Streamlit
- Powered by Ollama
- Uses LangChain for AI workflows
- Embeddings provided by Sentence Transformers