paperless-gpt/cline_docs/systemPatterns.md

# System Patterns

## Architecture Overview

### 1. Microservices Architecture
- **paperless-gpt**: AI processing service (Go)
- **paperless-ngx**: Document management system (external)
- Communication via REST API
- Docker-based deployment

### 2. Backend Architecture (Go)

#### Core Components
- **API Server**: HTTP handlers for document processing
- **LLM Integration**: Abstraction for multiple AI providers
- **Template Engine**: Dynamic prompt generation
- **Document Processor**: Handles OCR and metadata generation

#### Key Patterns
- **Template-Based Prompts**: Customizable templates for different AI tasks
- **Content Truncation**: Smart content limiting based on token counts
- **Concurrent Processing**: Goroutines for parallel document processing
- **Mutex-Protected Resources**: Thread-safe template access
- **Error Propagation**: Structured error handling across layers

### 3. Frontend Architecture (React/TypeScript)

#### Components
- Document Processor
- Suggestion Review
- Document Cards
- Sidebar Navigation
- Success Modal

#### State Management
- Local component state
- Props for component communication
- API integration for data fetching

### 4. Integration Patterns

#### API Communication
- RESTful endpoints
- JSON payload structure
- Token-based authentication
- Error response handling

#### LLM Provider Integration
- Provider abstraction layer
- Support for multiple providers (OpenAI, Ollama)
- Configurable models and parameters
- Vision model support for OCR

### 5. Data Flow

#### Document Processing Flow (Manual)
1. Document tagged in paperless-ngx
2. paperless-gpt detects tagged documents
3. AI processing (title/tags/correspondent generation)
4. Manual review or auto-apply
5. Update back to paperless-ngx

#### Document Processing Flow (Auto)
1. Document tagged in paperless-ngx with some 'auto' tag (env: AUTO_TAG)
2. paperless-gpt automatically processes documents
3. AI processing (title/tags/correspondent generation)
4. Auto-apply results back to paperless-ngx

#### OCR Processing Flow
1. Image/PDF input
2. Vision model processing
3. Text extraction and cleanup
4. Integration with document processing

### 6. Security Patterns
- API token authentication
- Environment-based configuration
- Docker container isolation
- Rate limiting and token management

### 7. Development Patterns
- Clear separation of concerns
- Dependency injection
- Interface-based design
- Concurrent processing with safety
- Comprehensive error handling
- Template-based customization

### 8. Testing Patterns
- Unit tests for core logic
- Integration tests for API
- E2E tests for web interface
- Test fixtures and mocks
- Playwright for frontend testing

## OCR System Patterns

### OCR Provider Architecture

#### 1. Provider Interface
- Common interface for all OCR implementations
- Methods for image processing
- Configuration through standardized Config struct
- Resource management patterns

#### 2. LLM Provider Implementation
- Supports OpenAI and Ollama vision models
- Base64 encoding for OpenAI requests
- Binary format for Ollama requests
- Template-based OCR prompts

#### 3. Google Document AI Provider
- Enterprise-grade OCR processing
- MIME type validation
- Processor configuration via environment
- Regional endpoint support

### Logging Patterns

#### 1. Provider Initialization
```
[INFO] Initializing OCR provider: llm
[INFO] Using LLM OCR provider (provider=ollama, model=minicpm-v)
```

#### 2. Processing Logs
```
[DEBUG] Starting OCR processing
[DEBUG] Image dimensions (width=800, height=1200)
[DEBUG] Using binary image format for non-OpenAI provider
[DEBUG] Sending request to vision model
[INFO] Successfully processed image (content_length=1536)
```

#### 3. Error Logging
```
[ERROR] Failed to decode image: invalid format
[ERROR] Unsupported file type: image/webp
[ERROR] Failed to get response from vision model
```

### Error Handling Patterns

#### 1. Configuration Validation
- Required parameter checks
- Environment variable validation
- Provider-specific configuration
- Connection testing

#### 2. Processing Errors
- Image format validation
- MIME type checking
- Content processing errors
- Provider-specific error handling

#### 3. Error Propagation
- Detailed error contexts
- Original error wrapping
- Logging with error context
- Recovery mechanisms

### Processing Flow

#### 1. Document Processing
```
Document Tagged → OCR Provider Selected → Image Processing → Text Extraction → Content Update
```

#### 2. Provider Selection
```
Config Check → Provider Initialization → Resource Setup → Provider Ready
```

#### 3. Error Recovery
```
Error Detection → Logging → Cleanup → Error Propagation
```

These patterns ensure consistent behavior across OCR providers while maintaining proper logging and error handling throughout the system.
feat: initialize memory bank documentation for paperless-gpt project 2025-02-03 02:30:53 -06:00			`# System Patterns`

			`## Architecture Overview`

			`### 1. Microservices Architecture`
			`- paperless-gpt: AI processing service (Go)`
			`- paperless-ngx: Document management system (external)`
			`- Communication via REST API`
			`- Docker-based deployment`

			`### 2. Backend Architecture (Go)`

			`#### Core Components`
			`- API Server: HTTP handlers for document processing`
			`- LLM Integration: Abstraction for multiple AI providers`
			`- Template Engine: Dynamic prompt generation`
			`- Document Processor: Handles OCR and metadata generation`

			`#### Key Patterns`
			`- Template-Based Prompts: Customizable templates for different AI tasks`
			`- Content Truncation: Smart content limiting based on token counts`
			`- Concurrent Processing: Goroutines for parallel document processing`
			`- Mutex-Protected Resources: Thread-safe template access`
			`- Error Propagation: Structured error handling across layers`

			`### 3. Frontend Architecture (React/TypeScript)`

			`#### Components`
			`- Document Processor`
			`- Suggestion Review`
			`- Document Cards`
			`- Sidebar Navigation`
			`- Success Modal`

			`#### State Management`
			`- Local component state`
			`- Props for component communication`
			`- API integration for data fetching`

			`### 4. Integration Patterns`

			`#### API Communication`
			`- RESTful endpoints`
			`- JSON payload structure`
			`- Token-based authentication`
			`- Error response handling`

			`#### LLM Provider Integration`
			`- Provider abstraction layer`
			`- Support for multiple providers (OpenAI, Ollama)`
			`- Configurable models and parameters`
			`- Vision model support for OCR`

			`### 5. Data Flow`

docs: update systemPatterns.md to include manual and auto document processing flows 2025-02-03 02:42:35 -06:00			`#### Document Processing Flow (Manual)`
feat: initialize memory bank documentation for paperless-gpt project 2025-02-03 02:30:53 -06:00			`1. Document tagged in paperless-ngx`
			`2. paperless-gpt detects tagged documents`
			`3. AI processing (title/tags/correspondent generation)`
			`4. Manual review or auto-apply`
			`5. Update back to paperless-ngx`

docs: update systemPatterns.md to include manual and auto document processing flows 2025-02-03 02:42:35 -06:00			`#### Document Processing Flow (Auto)`
			`1. Document tagged in paperless-ngx with some 'auto' tag (env: AUTO_TAG)`
			`2. paperless-gpt automatically processes documents`
			`3. AI processing (title/tags/correspondent generation)`
			`4. Auto-apply results back to paperless-ngx`

feat: initialize memory bank documentation for paperless-gpt project 2025-02-03 02:30:53 -06:00			`#### OCR Processing Flow`
			`1. Image/PDF input`
			`2. Vision model processing`
			`3. Text extraction and cleanup`
			`4. Integration with document processing`

			`### 6. Security Patterns`
			`- API token authentication`
			`- Environment-based configuration`
			`- Docker container isolation`
			`- Rate limiting and token management`

			`### 7. Development Patterns`
			`- Clear separation of concerns`
			`- Dependency injection`
			`- Interface-based design`
			`- Concurrent processing with safety`
			`- Comprehensive error handling`
			`- Template-based customization`

			`### 8. Testing Patterns`
			`- Unit tests for core logic`
			`- Integration tests for API`
			`- E2E tests for web interface`
			`- Test fixtures and mocks`
			`- Playwright for frontend testing`
feat(ocr): enhance OCR processing with structured results and hOCR su… (#212) * feat(ocr): enhance OCR processing with structured results and hOCR support * Update ocr/google_docai_provider.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update ocr/google_docai_provider_test.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * refactor(tests): remove unused context import from google_docai_provider_test.go * refactor: Add defensive checks for language code in Google DocAI provider (#226) * Update ocr/google_docai_provider.go Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * Update ocr/google_docai_provider.go Co-authored-by: gardar <gardar@users.noreply.github.com> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: mkrinke <mad.krinke@googlemail.com> Co-authored-by: gardar <gardar@users.noreply.github.com> 2025-03-10 03:43:50 -05:00
			`## OCR System Patterns`

			`### OCR Provider Architecture`

			`#### 1. Provider Interface`
			`- Common interface for all OCR implementations`
			`- Methods for image processing`
			`- Configuration through standardized Config struct`
			`- Resource management patterns`

			`#### 2. LLM Provider Implementation`
			`- Supports OpenAI and Ollama vision models`
			`- Base64 encoding for OpenAI requests`
			`- Binary format for Ollama requests`
			`- Template-based OCR prompts`

			`#### 3. Google Document AI Provider`
			`- Enterprise-grade OCR processing`
			`- MIME type validation`
			`- Processor configuration via environment`
			`- Regional endpoint support`

			`### Logging Patterns`

			`#### 1. Provider Initialization`
			```
			`[INFO] Initializing OCR provider: llm`
			`[INFO] Using LLM OCR provider (provider=ollama, model=minicpm-v)`
			```

			`#### 2. Processing Logs`
			```
			`[DEBUG] Starting OCR processing`
			`[DEBUG] Image dimensions (width=800, height=1200)`
			`[DEBUG] Using binary image format for non-OpenAI provider`
			`[DEBUG] Sending request to vision model`
			`[INFO] Successfully processed image (content_length=1536)`
			```

			`#### 3. Error Logging`
			```
			`[ERROR] Failed to decode image: invalid format`
			`[ERROR] Unsupported file type: image/webp`
			`[ERROR] Failed to get response from vision model`
			```

			`### Error Handling Patterns`

			`#### 1. Configuration Validation`
			`- Required parameter checks`
			`- Environment variable validation`
			`- Provider-specific configuration`
			`- Connection testing`

			`#### 2. Processing Errors`
			`- Image format validation`
			`- MIME type checking`
			`- Content processing errors`
			`- Provider-specific error handling`

			`#### 3. Error Propagation`
			`- Detailed error contexts`
			`- Original error wrapping`
			`- Logging with error context`
			`- Recovery mechanisms`

			`### Processing Flow`

			`#### 1. Document Processing`
			```
			`Document Tagged → OCR Provider Selected → Image Processing → Text Extraction → Content Update`
			```

			`#### 2. Provider Selection`
			```
			`Config Check → Provider Initialization → Resource Setup → Provider Ready`
			```

			`#### 3. Error Recovery`
			```
			`Error Detection → Logging → Cleanup → Error Propagation`
			```

			`These patterns ensure consistent behavior across OCR providers while maintaining proper logging and error handling throughout the system.`