Skip to content

A Streamlit-based AI chat application that allows users to upload PDF documents and ask questions about their content. Uses local models(Ollama) and sentence transformers to create embeddings, enabling intelligent document retrieval and conversation.

License

Notifications You must be signed in to change notification settings

snigdhasv/PDF_Chat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Chat Application

A Streamlit-based application that allows you to chat with your PDF documents using AI. The application uses local language models and embeddings to provide intelligent responses based on the content of your uploaded PDFs.

📸 Screenshots

PDF Chat Application Interface
Main application interface showing chat with PDF documents

Features

  • 📄 PDF Upload: Upload multiple PDF documents
  • 🤖 AI Chat: Ask questions about your PDF content
  • 🧠 Local AI: Uses Ollama with local language models (no API costs)
  • 🔍 Semantic Search: Advanced document retrieval using sentence transformers
  • 💬 Chat Interface: Beautiful chat UI with user and bot avatars
  • 📝 Memory: Remembers conversation context

Prerequisites

Before running this application, make sure you have:

  1. Python 3.8+ installed
  2. Ollama installed and running locally
  3. Required Python packages (see installation section)

Installing Ollama

  1. Download Ollama from https://ollama.com/download
  2. Install and start Ollama
  3. Pull the required model:
    ollama pull deepseek-r1:1.5b

Installation

  1. Clone or download this project to your local machine

  2. Navigate to the project directory:

    cd PDF_Chat
  3. Create a virtual environment (recommended):

    python -m venv venv
  4. Activate the virtual environment:

    • Windows:
      venv\Scripts\activate
    • macOS/Linux:
      source venv/bin/activate
  5. API Keys: Obtain your hugging face api key and add it to the .env file in the project directory

    HUGGINGFACEHUB_API_KEY=your_api_key
  6. Install required packages:

    pip install -r requirements.txt

Usage

  1. Start the application:

    streamlit run app.py
  2. Open your browser and go to the URL shown in the terminal (usually http://localhost:8501)

  3. Upload PDF documents:

    • Use the sidebar to upload one or more PDF files
    • Click the "Process" button to index the documents
  4. Start chatting:

    • Type your questions in the text input
    • The AI will answer based on the content of your uploaded PDFs

How It Works

Application Workflow
High-level overview of how the application processes and responds to queries

  1. Document Processing: PDFs are converted to text and split into chunks
  2. Embedding Generation: Text chunks are converted to vector embeddings using sentence transformers
  3. Vector Storage: Embeddings are stored in a FAISS vector database for fast retrieval
  4. Question Answering: When you ask a question:
    • The question is converted to an embedding
    • Similar document chunks are retrieved
    • The local language model generates an answer based on the retrieved context

Technical Stack

Python Streamlit LangChain Ollama

  • Frontend: Streamlit
  • PDF Processing: PyPDF2
  • Text Splitting: LangChain CharacterTextSplitter
  • Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
  • Vector Database: FAISS
  • Language Model: Ollama with deepseek-r1:1.5b
  • Conversation Management: LangChain ConversationalRetrievalChain

File Structure

Chat_with_pdfs/
├── .env
├── app.py              # Main Streamlit application
├── htmlTemplates.py    # CSS styles and HTML templates
├── README.md          # This file
├── images/            # Screenshots and diagrams
│   ├── app-screenshot.png
│   └── workflow-diagram.png
└── venv/              # Virtual environment (created during setup)

Customization

Changing the Language Model

To use a different Ollama model, modify the get_conversation_chain function in app.py:

def get_conversation_chain(vectorstore):
    llm = Ollama(model="your-preferred-model")  # Change this line
    # ... rest of the function

Modifying the Chat Interface

Edit htmlTemplates.py to customize:

  • Chat message styling
  • Avatar images
  • Colors and layout

Troubleshooting

Common Issues

  1. "Ollama model not found" error:

    • Make sure Ollama is running
    • Pull the required model: ollama pull deepseek-r1:1.5b
  2. Import errors:

    • Ensure all packages are installed in your virtual environment
    • Check that you're using the correct Python version
  3. PDF processing issues:

    • Ensure PDFs are not password-protected
    • Check that PDFs contain extractable text

Performance Tips

  • For large PDFs, processing may take some time
  • The first question after processing might be slower as the model loads
  • Consider using smaller chunk sizes for faster processing

Contributing

Feel free to submit issues, feature requests, or pull requests to improve this application.

License

This project is open source and available under the MIT License.

Acknowledgments

About

A Streamlit-based AI chat application that allows users to upload PDF documents and ask questions about their content. Uses local models(Ollama) and sentence transformers to create embeddings, enabling intelligent document retrieval and conversation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages