paperless-gpt/cline_docs/systemPatterns.md

123 lines
3.4 KiB
Markdown
Raw Normal View History

# System Patterns
## Architecture Overview
### 1. Microservices Architecture
- **paperless-gpt**: AI processing service (Go)
- **paperless-ngx**: Document management system (external)
- Communication via REST API
- Docker-based deployment
### 2. Backend Architecture (Go)
#### Core Components
- **API Server**: HTTP handlers for document processing
- **LLM Integration**: Abstraction for multiple AI providers
- **Template Engine**: Dynamic prompt generation
- **Document Processor**: Handles OCR and metadata generation
#### Key Patterns
- **Template-Based Prompts**: Customizable templates for different AI tasks
- **Content Truncation**: Smart content limiting based on token counts
- **Concurrent Processing**: Goroutines for parallel document processing
- **Mutex-Protected Resources**: Thread-safe template access
- **Error Propagation**: Structured error handling across layers
### 3. Frontend Architecture (React/TypeScript)
#### Components
- Document Processor
- Suggestion Review
- Document Cards
- Sidebar Navigation
- Success Modal
#### State Management
- Local component state
- Props for component communication
- API integration for data fetching
### 4. Integration Patterns
#### API Communication
- RESTful endpoints
- JSON payload structure
- Token-based authentication
- Error response handling
#### LLM Provider Integration
- Provider abstraction layer
- Support for multiple providers (OpenAI, Ollama)
- Configurable models and parameters
- Vision model support for OCR
### 5. Data Flow
#### Document Processing Flow (Manual)
1. Document tagged in paperless-ngx
2. paperless-gpt detects tagged documents
3. AI processing (title/tags/correspondent generation)
4. Manual review or auto-apply
5. Update back to paperless-ngx
#### Document Processing Flow (Auto)
1. Document tagged in paperless-ngx with some 'auto' tag (env: AUTO_TAG)
2. paperless-gpt automatically processes documents
3. AI processing (title/tags/correspondent generation)
4. Auto-apply results back to paperless-ngx
#### OCR Processing Flow
1. Image/PDF input
2. Vision model processing
3. Text extraction and cleanup
4. Integration with document processing
### 6. Security Patterns
- API token authentication
- Environment-based configuration
- Docker container isolation
- Rate limiting and token management
### 7. Development Patterns
- Clear separation of concerns
- Dependency injection
- Interface-based design
- Concurrent processing with safety
- Comprehensive error handling
- Template-based customization
### 8. Testing Patterns
- Unit tests for core logic
- Integration tests for API
- E2E tests for web interface
- Test fixtures and mocks
- Playwright for frontend testing
2025-02-10 06:08:49 -06:00
### 9. Build and Release Patterns
#### Binary Distribution
- **GoReleaser Configuration**: Manages build and release process
- CGO-enabled builds with musl tag
- Linux/amd64 platform support
- Static linking for SQLite and MuPDF
- Automated versioning and changelog
- Docker image publishing
#### Release Process
- **Automated Workflows**: GitHub Actions for releases
- **Local Testing**: Docker-based testing environment
- **Quality Gates**:
- Pre-release checklist verification
- Binary verification on clean system
- Dependency validation
- Environment configuration testing
#### Build Dependencies
- **System Libraries**:
- gcc and musl-dev for compilation
- mupdf and mupdf-dev for OCR
- **Version Management**:
- Semantic versioning
- Automated version injection
- Build metadata inclusion