mirror of https://github.com/icereed/paperless-gpt.git synced 2025-03-12 12:58:02 -05:00

feat(ocr): enhance OCR processing with structured results and hOCR su… (#212 )

* feat(ocr): enhance OCR processing with structured results and hOCR support

* Update ocr/google_docai_provider.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update ocr/google_docai_provider_test.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* refactor(tests): remove unused context import from google_docai_provider_test.go

* refactor: Add defensive checks for language code in Google DocAI provider (#226)

* Update ocr/google_docai_provider.go

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update ocr/google_docai_provider.go

Co-authored-by: gardar <gardar@users.noreply.github.com>

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: mkrinke <mad.krinke@googlemail.com>
Co-authored-by: gardar <gardar@users.noreply.github.com>

2025-03-10 08:43:50 +00:00

4.7 KiB

Raw Permalink Blame History

System Patterns

Architecture Overview

1. Microservices Architecture

paperless-gpt: AI processing service (Go)
paperless-ngx: Document management system (external)
Communication via REST API
Docker-based deployment

2. Backend Architecture (Go)

Core Components

API Server: HTTP handlers for document processing
LLM Integration: Abstraction for multiple AI providers
Template Engine: Dynamic prompt generation
Document Processor: Handles OCR and metadata generation

Key Patterns

Template-Based Prompts: Customizable templates for different AI tasks
Content Truncation: Smart content limiting based on token counts
Concurrent Processing: Goroutines for parallel document processing
Mutex-Protected Resources: Thread-safe template access
Error Propagation: Structured error handling across layers

3. Frontend Architecture (React/TypeScript)

Components

Document Processor
Suggestion Review
Document Cards
Sidebar Navigation
Success Modal

State Management

Local component state
Props for component communication
API integration for data fetching

4. Integration Patterns

API Communication

RESTful endpoints
JSON payload structure
Token-based authentication
Error response handling

LLM Provider Integration

Provider abstraction layer
Support for multiple providers (OpenAI, Ollama)
Configurable models and parameters
Vision model support for OCR

5. Data Flow

Document Processing Flow (Manual)

Document tagged in paperless-ngx
paperless-gpt detects tagged documents
AI processing (title/tags/correspondent generation)
Manual review or auto-apply
Update back to paperless-ngx

Document Processing Flow (Auto)

Document tagged in paperless-ngx with some 'auto' tag (env: AUTO_TAG)
paperless-gpt automatically processes documents
AI processing (title/tags/correspondent generation)
Auto-apply results back to paperless-ngx

OCR Processing Flow

Image/PDF input
Vision model processing
Text extraction and cleanup
Integration with document processing

6. Security Patterns

API token authentication
Environment-based configuration
Docker container isolation
Rate limiting and token management

7. Development Patterns

Clear separation of concerns
Dependency injection
Interface-based design
Concurrent processing with safety
Comprehensive error handling
Template-based customization

8. Testing Patterns

Unit tests for core logic
Integration tests for API
E2E tests for web interface
Test fixtures and mocks
Playwright for frontend testing

OCR System Patterns

OCR Provider Architecture

1. Provider Interface

Common interface for all OCR implementations
Methods for image processing
Configuration through standardized Config struct
Resource management patterns

2. LLM Provider Implementation

Supports OpenAI and Ollama vision models
Base64 encoding for OpenAI requests
Binary format for Ollama requests
Template-based OCR prompts

3. Google Document AI Provider

Enterprise-grade OCR processing
MIME type validation
Processor configuration via environment
Regional endpoint support

Logging Patterns

1. Provider Initialization

[INFO] Initializing OCR provider: llm
[INFO] Using LLM OCR provider (provider=ollama, model=minicpm-v)

2. Processing Logs

[DEBUG] Starting OCR processing
[DEBUG] Image dimensions (width=800, height=1200)
[DEBUG] Using binary image format for non-OpenAI provider
[DEBUG] Sending request to vision model
[INFO] Successfully processed image (content_length=1536)

3. Error Logging

[ERROR] Failed to decode image: invalid format
[ERROR] Unsupported file type: image/webp
[ERROR] Failed to get response from vision model

Error Handling Patterns

1. Configuration Validation

Required parameter checks
Environment variable validation
Provider-specific configuration
Connection testing

2. Processing Errors

Image format validation
MIME type checking
Content processing errors
Provider-specific error handling

3. Error Propagation

Detailed error contexts
Original error wrapping
Logging with error context
Recovery mechanisms

Processing Flow

1. Document Processing

Document Tagged → OCR Provider Selected → Image Processing → Text Extraction → Content Update

2. Provider Selection

Config Check → Provider Initialization → Resource Setup → Provider Ready

3. Error Recovery

Error Detection → Logging → Cleanup → Error Propagation

These patterns ensure consistent behavior across OCR providers while maintaining proper logging and error handling throughout the system.

4.7 KiB Raw Permalink Blame History