paperless-gpt/docs/ARCHITECTURE.md

5 KiB

paperless-gpt Architecture

This document provides a comprehensive overview of the paperless-gpt architecture, explaining how different components interact to provide AI-powered document processing capabilities.

System Overview

paperless-gpt is designed as a companion service to paperless-ngx, adding AI capabilities for document processing. The system consists of several key components:

graph TB
    UI[Web UI] --> API[Backend API]
    API --> LLM[LLM Service]
    API --> OCR[OCR Service]
    API --> DB[Local DB]
    API --> PaperlessNGX[paperless-ngx API]
    LLM --> OpenAI[OpenAI]
    LLM --> Ollama[Ollama]
    OCR --> VisionLLM[Vision LLM]

Core Components

1. Backend API (Go)

  • Handles all business logic
  • Manages document processing workflow
  • Coordinates between services
  • Provides REST API endpoints
  • Manages state and caching

2. Web UI (React + TypeScript)

  • User interface for document management
  • Real-time processing status
  • Document preview and editing
  • Configuration interface
  • Responsive design

3. LLM Service

  • Manages LLM provider connections
  • Handles prompt engineering
  • Processes document content
  • Generates metadata suggestions
  • Supports multiple providers:
    • OpenAI (gpt-4, gpt-3.5-turbo)
    • Ollama (llama2, etc.)

4. OCR Service

  • Vision LLM integration
  • Image preprocessing
  • Text extraction
  • Layout analysis
  • Quality enhancement

5. Local Database

  • Caches processing results
  • Stores configuration
  • Manages queues
  • Tracks document state

Data Flow

Document Processing Flow

sequenceDiagram
    participant U as User
    participant UI as Web UI
    participant API as Backend API
    participant LLM as LLM Service
    participant OCR as OCR Service
    participant PNX as paperless-ngx

    U->>UI: Upload Document
    UI->>API: Process Request
    API->>OCR: Extract Text
    OCR-->>API: Text Content
    API->>LLM: Generate Metadata
    LLM-->>API: Suggestions
    API->>UI: Preview Results
    U->>UI: Approve Changes
    UI->>API: Apply Changes
    API->>PNX: Update Document
    PNX-->>API: Confirmation
    API-->>UI: Success

Key Design Decisions

1. Modular Architecture

  • Separation of concerns
  • Pluggable components
  • Easy to extend
  • Maintainable code

2. Stateless Design

  • Scalable architecture
  • No shared state
  • Resilient operation
  • Easy deployment

3. Security First

  • API authentication
  • Data encryption
  • Input validation
  • Error handling

4. Performance Optimization

  • Local caching
  • Batch processing
  • Async operations
  • Resource management

Directory Structure

paperless-gpt/
├── main.go                 # Application entry point
├── app_llm.go             # LLM service implementation
├── app_http_handlers.go    # HTTP handlers
├── paperless.go           # paperless-ngx integration
├── ocr.go                 # OCR service
├── types.go               # Type definitions
├── web-app/               # Frontend application
│   ├── src/
│   │   ├── components/    # React components
│   │   ├── App.tsx       # Main application
│   │   └── ...
│   └── ...
└── ...

Configuration Management

The system uses environment variables for configuration, allowing easy deployment and configuration changes:

PAPERLESS_BASE_URL        # paperless-ngx connection
LLM_PROVIDER             # AI backend selection
VISION_LLM_PROVIDER      # OCR provider selection
...

Error Handling

The system implements comprehensive error handling:

  1. User Errors

    • Input validation
    • Clear error messages
    • Guided resolution
  2. System Errors

    • Graceful degradation
    • Automatic retry
    • Error logging
    • Monitoring alerts
  3. External Service Errors

    • Fallback options
    • Circuit breaking
    • Rate limiting
    • Error reporting

Scaling Considerations

The architecture supports scaling through:

  1. Horizontal Scaling

    • Stateless design
    • Load balancing
    • Distributed processing
  2. Resource Management

    • Connection pooling
    • Cache management
    • Queue processing
    • Rate limiting
  3. Performance Optimization

    • Batch processing
    • Async operations
    • Efficient algorithms
    • Resource caching

Future Considerations

The architecture is designed to support future enhancements:

  1. Plugin System

    • Custom processors
    • Integration points
    • Event hooks
  2. Advanced Features

    • Multi-language support
    • Custom ML models
    • Advanced analytics
  3. Integration Options

    • API extensions
    • Service hooks
    • Custom providers

Development Guidelines

When making changes to the architecture:

  1. Documentation

    • Update this document
    • Add inline comments
    • Update API docs
  2. Testing

    • Unit tests
    • Integration tests
    • Performance tests
  3. Review Process

    • Architecture review
    • Security review
    • Performance review

This architecture documentation is maintained by the core team and updated as the system evolves.