mirror of
https://github.com/icereed/paperless-gpt.git
synced 2025-03-12 12:58:02 -05:00
* feat(ocr): enhance OCR processing with structured results and hOCR support * Update ocr/google_docai_provider.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update ocr/google_docai_provider_test.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * refactor(tests): remove unused context import from google_docai_provider_test.go * refactor: Add defensive checks for language code in Google DocAI provider (#226) * Update ocr/google_docai_provider.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update ocr/google_docai_provider.go Co-authored-by: gardar <gardar@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: mkrinke <mad.krinke@googlemail.com> Co-authored-by: gardar <gardar@users.noreply.github.com>
4.7 KiB
4.7 KiB
System Patterns
Architecture Overview
1. Microservices Architecture
- paperless-gpt: AI processing service (Go)
- paperless-ngx: Document management system (external)
- Communication via REST API
- Docker-based deployment
2. Backend Architecture (Go)
Core Components
- API Server: HTTP handlers for document processing
- LLM Integration: Abstraction for multiple AI providers
- Template Engine: Dynamic prompt generation
- Document Processor: Handles OCR and metadata generation
Key Patterns
- Template-Based Prompts: Customizable templates for different AI tasks
- Content Truncation: Smart content limiting based on token counts
- Concurrent Processing: Goroutines for parallel document processing
- Mutex-Protected Resources: Thread-safe template access
- Error Propagation: Structured error handling across layers
3. Frontend Architecture (React/TypeScript)
Components
- Document Processor
- Suggestion Review
- Document Cards
- Sidebar Navigation
- Success Modal
State Management
- Local component state
- Props for component communication
- API integration for data fetching
4. Integration Patterns
API Communication
- RESTful endpoints
- JSON payload structure
- Token-based authentication
- Error response handling
LLM Provider Integration
- Provider abstraction layer
- Support for multiple providers (OpenAI, Ollama)
- Configurable models and parameters
- Vision model support for OCR
5. Data Flow
Document Processing Flow (Manual)
- Document tagged in paperless-ngx
- paperless-gpt detects tagged documents
- AI processing (title/tags/correspondent generation)
- Manual review or auto-apply
- Update back to paperless-ngx
Document Processing Flow (Auto)
- Document tagged in paperless-ngx with some 'auto' tag (env: AUTO_TAG)
- paperless-gpt automatically processes documents
- AI processing (title/tags/correspondent generation)
- Auto-apply results back to paperless-ngx
OCR Processing Flow
- Image/PDF input
- Vision model processing
- Text extraction and cleanup
- Integration with document processing
6. Security Patterns
- API token authentication
- Environment-based configuration
- Docker container isolation
- Rate limiting and token management
7. Development Patterns
- Clear separation of concerns
- Dependency injection
- Interface-based design
- Concurrent processing with safety
- Comprehensive error handling
- Template-based customization
8. Testing Patterns
- Unit tests for core logic
- Integration tests for API
- E2E tests for web interface
- Test fixtures and mocks
- Playwright for frontend testing
OCR System Patterns
OCR Provider Architecture
1. Provider Interface
- Common interface for all OCR implementations
- Methods for image processing
- Configuration through standardized Config struct
- Resource management patterns
2. LLM Provider Implementation
- Supports OpenAI and Ollama vision models
- Base64 encoding for OpenAI requests
- Binary format for Ollama requests
- Template-based OCR prompts
3. Google Document AI Provider
- Enterprise-grade OCR processing
- MIME type validation
- Processor configuration via environment
- Regional endpoint support
Logging Patterns
1. Provider Initialization
[INFO] Initializing OCR provider: llm
[INFO] Using LLM OCR provider (provider=ollama, model=minicpm-v)
2. Processing Logs
[DEBUG] Starting OCR processing
[DEBUG] Image dimensions (width=800, height=1200)
[DEBUG] Using binary image format for non-OpenAI provider
[DEBUG] Sending request to vision model
[INFO] Successfully processed image (content_length=1536)
3. Error Logging
[ERROR] Failed to decode image: invalid format
[ERROR] Unsupported file type: image/webp
[ERROR] Failed to get response from vision model
Error Handling Patterns
1. Configuration Validation
- Required parameter checks
- Environment variable validation
- Provider-specific configuration
- Connection testing
2. Processing Errors
- Image format validation
- MIME type checking
- Content processing errors
- Provider-specific error handling
3. Error Propagation
- Detailed error contexts
- Original error wrapping
- Logging with error context
- Recovery mechanisms
Processing Flow
1. Document Processing
Document Tagged → OCR Provider Selected → Image Processing → Text Extraction → Content Update
2. Provider Selection
Config Check → Provider Initialization → Resource Setup → Provider Ready
3. Error Recovery
Error Detection → Logging → Cleanup → Error Propagation
These patterns ensure consistent behavior across OCR providers while maintaining proper logging and error handling throughout the system.