PDF Chat Application

A Streamlit-based application that allows you to chat with your PDF documents using AI. The application uses local language models and embeddings to provide intelligent responses based on the content of your uploaded PDFs.

📸 Screenshots

Main application interface showing chat with PDF documents

Features

📄 PDF Upload: Upload multiple PDF documents
🤖 AI Chat: Ask questions about your PDF content
🧠 Local AI: Uses Ollama with local language models (no API costs)
🔍 Semantic Search: Advanced document retrieval using sentence transformers
💬 Chat Interface: Beautiful chat UI with user and bot avatars
📝 Memory: Remembers conversation context

Prerequisites

Before running this application, make sure you have:

Python 3.8+ installed
Ollama installed and running locally
Required Python packages (see installation section)

Installing Ollama

Download Ollama from https://ollama.com/download
Install and start Ollama
Pull the required model:
```
ollama pull deepseek-r1:1.5b
```

Installation

Clone or download this project to your local machine
Navigate to the project directory:
```
cd PDF_Chat
```
Create a virtual environment (recommended):
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
venv\Scripts\activate
```
- macOS/Linux:
```
source venv/bin/activate
```
API Keys: Obtain your hugging face api key and add it to the .env file in the project directory
```
HUGGINGFACEHUB_API_KEY=your_api_key
```
Install required packages:
```
pip install -r requirements.txt
```

Usage

Start the application:
```
streamlit run app.py
```
Open your browser and go to the URL shown in the terminal (usually http://localhost:8501)
Upload PDF documents:
- Use the sidebar to upload one or more PDF files
- Click the "Process" button to index the documents
Start chatting:
- Type your questions in the text input
- The AI will answer based on the content of your uploaded PDFs

How It Works

High-level overview of how the application processes and responds to queries

Document Processing: PDFs are converted to text and split into chunks
Embedding Generation: Text chunks are converted to vector embeddings using sentence transformers
Vector Storage: Embeddings are stored in a FAISS vector database for fast retrieval
Question Answering: When you ask a question:
- The question is converted to an embedding
- Similar document chunks are retrieved
- The local language model generates an answer based on the retrieved context

Technical Stack

Frontend: Streamlit
PDF Processing: PyPDF2
Text Splitting: LangChain CharacterTextSplitter
Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
Vector Database: FAISS
Language Model: Ollama with deepseek-r1:1.5b
Conversation Management: LangChain ConversationalRetrievalChain

File Structure

Chat_with_pdfs/
├── .env
├── app.py              # Main Streamlit application
├── htmlTemplates.py    # CSS styles and HTML templates
├── README.md          # This file
├── images/            # Screenshots and diagrams
│   ├── app-screenshot.png
│   └── workflow-diagram.png
└── venv/              # Virtual environment (created during setup)

Customization

Changing the Language Model

To use a different Ollama model, modify the get_conversation_chain function in app.py:

def get_conversation_chain(vectorstore):
    llm = Ollama(model="your-preferred-model")  # Change this line
    # ... rest of the function

Modifying the Chat Interface

Edit htmlTemplates.py to customize:

Chat message styling
Avatar images
Colors and layout

Troubleshooting

Common Issues

"Ollama model not found" error:
- Make sure Ollama is running
- Pull the required model: ollama pull deepseek-r1:1.5b
Import errors:
- Ensure all packages are installed in your virtual environment
- Check that you're using the correct Python version
PDF processing issues:
- Ensure PDFs are not password-protected
- Check that PDFs contain extractable text

Performance Tips

For large PDFs, processing may take some time
The first question after processing might be slower as the model loads
Consider using smaller chunk sizes for faster processing

Contributing

Feel free to submit issues, feature requests, or pull requests to improve this application.

License

This project is open source and available under the MIT License.

Acknowledgments

Built with Streamlit
Powered by Ollama
Uses LangChain for AI workflows
Embeddings provided by Sentence Transformers

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
htmlTemplates.py		htmlTemplates.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Chat Application

📸 Screenshots

Features