paperless-gpt/docs/ARCHITECTURE.md

222 lines
5 KiB
Markdown
Raw Normal View History

# paperless-gpt Architecture
This document provides a comprehensive overview of the paperless-gpt architecture, explaining how different components interact to provide AI-powered document processing capabilities.
## System Overview
paperless-gpt is designed as a companion service to paperless-ngx, adding AI capabilities for document processing. The system consists of several key components:
```mermaid
graph TB
UI[Web UI] --> API[Backend API]
API --> LLM[LLM Service]
API --> OCR[OCR Service]
API --> DB[Local DB]
API --> PaperlessNGX[paperless-ngx API]
LLM --> OpenAI[OpenAI]
LLM --> Ollama[Ollama]
OCR --> VisionLLM[Vision LLM]
```
## Core Components
### 1. Backend API (Go)
- Handles all business logic
- Manages document processing workflow
- Coordinates between services
- Provides REST API endpoints
- Manages state and caching
### 2. Web UI (React + TypeScript)
- User interface for document management
- Real-time processing status
- Document preview and editing
- Configuration interface
- Responsive design
### 3. LLM Service
- Manages LLM provider connections
- Handles prompt engineering
- Processes document content
- Generates metadata suggestions
- Supports multiple providers:
- OpenAI (gpt-4, gpt-3.5-turbo)
- Ollama (llama2, etc.)
### 4. OCR Service
- Vision LLM integration
- Image preprocessing
- Text extraction
- Layout analysis
- Quality enhancement
### 5. Local Database
- Caches processing results
- Stores configuration
- Manages queues
- Tracks document state
## Data Flow
### Document Processing Flow
```mermaid
sequenceDiagram
participant U as User
participant UI as Web UI
participant API as Backend API
participant LLM as LLM Service
participant OCR as OCR Service
participant PNX as paperless-ngx
U->>UI: Upload Document
UI->>API: Process Request
API->>OCR: Extract Text
OCR-->>API: Text Content
API->>LLM: Generate Metadata
LLM-->>API: Suggestions
API->>UI: Preview Results
U->>UI: Approve Changes
UI->>API: Apply Changes
API->>PNX: Update Document
PNX-->>API: Confirmation
API-->>UI: Success
```
## Key Design Decisions
### 1. Modular Architecture
- Separation of concerns
- Pluggable components
- Easy to extend
- Maintainable code
### 2. Stateless Design
- Scalable architecture
- No shared state
- Resilient operation
- Easy deployment
### 3. Security First
- API authentication
- Data encryption
- Input validation
- Error handling
### 4. Performance Optimization
- Local caching
- Batch processing
- Async operations
- Resource management
## Directory Structure
```
paperless-gpt/
├── main.go # Application entry point
├── app_llm.go # LLM service implementation
├── app_http_handlers.go # HTTP handlers
├── paperless.go # paperless-ngx integration
├── ocr.go # OCR service
├── types.go # Type definitions
├── web-app/ # Frontend application
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── App.tsx # Main application
│ │ └── ...
│ └── ...
└── ...
```
## Configuration Management
The system uses environment variables for configuration, allowing easy deployment and configuration changes:
```
PAPERLESS_BASE_URL # paperless-ngx connection
LLM_PROVIDER # AI backend selection
VISION_LLM_PROVIDER # OCR provider selection
...
```
## Error Handling
The system implements comprehensive error handling:
1. **User Errors**
- Input validation
- Clear error messages
- Guided resolution
2. **System Errors**
- Graceful degradation
- Automatic retry
- Error logging
- Monitoring alerts
3. **External Service Errors**
- Fallback options
- Circuit breaking
- Rate limiting
- Error reporting
## Scaling Considerations
The architecture supports scaling through:
1. **Horizontal Scaling**
- Stateless design
- Load balancing
- Distributed processing
2. **Resource Management**
- Connection pooling
- Cache management
- Queue processing
- Rate limiting
3. **Performance Optimization**
- Batch processing
- Async operations
- Efficient algorithms
- Resource caching
## Future Considerations
The architecture is designed to support future enhancements:
1. **Plugin System**
- Custom processors
- Integration points
- Event hooks
2. **Advanced Features**
- Multi-language support
- Custom ML models
- Advanced analytics
3. **Integration Options**
- API extensions
- Service hooks
- Custom providers
## Development Guidelines
When making changes to the architecture:
1. **Documentation**
- Update this document
- Add inline comments
- Update API docs
2. **Testing**
- Unit tests
- Integration tests
- Performance tests
3. **Review Process**
- Architecture review
- Security review
- Performance review
This architecture documentation is maintained by the core team and updated as the system evolves.