Update README: Revise features and env vars for DeepSeek integration (#197)

This commit is contained in:
Icereed 2025-02-07 08:29:42 +01:00 committed by GitHub
parent 87b0b4fc07
commit 712ed53c1c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -20,21 +20,24 @@ https://github.com/user-attachments/assets/bd5d38b9-9309-40b9-93ca-918dfa4f3fd4
2. **Automatic Title & Tag Generation** 2. **Automatic Title & Tag Generation**
No more guesswork. Let the AI do the naming and categorizing. You can easily review suggestions and refine them if needed. No more guesswork. Let the AI do the naming and categorizing. You can easily review suggestions and refine them if needed.
3. **Automatic Correspondent Generation** 3. **Supports DeepSeek reasoning models in Ollama**
Greatly enhance accuracy by using a reasoning model like `deepseek-r1:8b`. The perfect tradeoff between privacy and performance! Of course, if you got enough GPUs or NPUs, a bigger model will enhance the experience.
5. **Automatic Correspondent Generation**
Automatically identify and generate correspondents from your documents, making it easier to track and organize your communications. Automatically identify and generate correspondents from your documents, making it easier to track and organize your communications.
4. **Extensive Customization** 6. **Extensive Customization**
- **Prompt Templates**: Tweak your AI prompts to reflect your domain, style, or preference. - **Prompt Templates**: Tweak your AI prompts to reflect your domain, style, or preference.
- **Tagging**: Decide how documents get tagged—manually, automatically, or via OCR-based flows. - **Tagging**: Decide how documents get tagged—manually, automatically, or via OCR-based flows.
5. **Simple Docker Deployment** 7. **Simple Docker Deployment**
A few environment variables, and youre off! Compose it alongside paperless-ngx with minimal fuss. A few environment variables, and youre off! Compose it alongside paperless-ngx with minimal fuss.
6. **Unified Web UI** 8. **Unified Web UI**
- **Manual Review**: Approve or tweak AIs suggestions. - **Manual Review**: Approve or tweak AIs suggestions.
- **Auto Processing**: Focus only on edge cases while the rest is sorted for you. - **Auto Processing**: Focus only on edge cases while the rest is sorted for you.
7. **Opt-In LLM-based OCR** 9. **Opt-In LLM-based OCR**
If you opt in, your images get read by a Vision LLM, pushing boundaries beyond standard OCR tools. If you opt in, your images get read by a Vision LLM, pushing boundaries beyond standard OCR tools.
--- ---
@ -69,7 +72,7 @@ https://github.com/user-attachments/assets/bd5d38b9-9309-40b9-93ca-918dfa4f3fd4
- A running instance of [paperless-ngx][paperless-ngx]. - A running instance of [paperless-ngx][paperless-ngx].
- Access to an LLM provider: - Access to an LLM provider:
- **OpenAI**: An API key with models like `gpt-4o` or `gpt-3.5-turbo`. - **OpenAI**: An API key with models like `gpt-4o` or `gpt-3.5-turbo`.
- **Ollama**: A running Ollama server with models like `llama2`. - **Ollama**: A running Ollama server with models like `deepseek-r1:8b`.
### Installation ### Installation
@ -93,7 +96,9 @@ services:
MANUAL_TAG: 'paperless-gpt' # Optional, default: paperless-gpt MANUAL_TAG: 'paperless-gpt' # Optional, default: paperless-gpt
AUTO_TAG: 'paperless-gpt-auto' # Optional, default: paperless-gpt-auto AUTO_TAG: 'paperless-gpt-auto' # Optional, default: paperless-gpt-auto
LLM_PROVIDER: 'openai' # or 'ollama' LLM_PROVIDER: 'openai' # or 'ollama'
LLM_MODEL: 'gpt-4o' # or 'llama2' LLM_MODEL: 'gpt-4o' # or 'deepseek-r1:8b'
# Optional, but recommended for Ollama
TOKEN_LIMIT: 1000
OPENAI_API_KEY: 'your_openai_api_key' OPENAI_API_KEY: 'your_openai_api_key'
# Optional - OPENAI_BASE_URL: 'https://litellm.yourinstallationof.it.com/v1' # Optional - OPENAI_BASE_URL: 'https://litellm.yourinstallationof.it.com/v1'
LLM_LANGUAGE: 'English' # Optional, default: English LLM_LANGUAGE: 'English' # Optional, default: English
@ -160,7 +165,7 @@ services:
| `MANUAL_TAG` | Tag for manual processing. Default: `paperless-gpt`. | No | | `MANUAL_TAG` | Tag for manual processing. Default: `paperless-gpt`. | No |
| `AUTO_TAG` | Tag for auto processing. Default: `paperless-gpt-auto`. | No | | `AUTO_TAG` | Tag for auto processing. Default: `paperless-gpt-auto`. | No |
| `LLM_PROVIDER` | AI backend (`openai` or `ollama`). | Yes | | `LLM_PROVIDER` | AI backend (`openai` or `ollama`). | Yes |
| `LLM_MODEL` | AI model name, e.g. `gpt-4o`, `gpt-3.5-turbo`, `llama2`. | Yes | | `LLM_MODEL` | AI model name, e.g. `gpt-4o`, `gpt-3.5-turbo`, `deepseek-r1:8b`. | Yes |
| `OPENAI_API_KEY` | OpenAI API key (required if using OpenAI). | Cond. | | `OPENAI_API_KEY` | OpenAI API key (required if using OpenAI). | Cond. |
| `OPENAI_BASE_URL` | OpenAI base URL (optional, if using a custom OpenAI compatible service like LiteLLM). | No | | `OPENAI_BASE_URL` | OpenAI base URL (optional, if using a custom OpenAI compatible service like LiteLLM). | No |
| `LLM_LANGUAGE` | Likely language for documents (e.g. `English`). Default: `English`. | No | | `LLM_LANGUAGE` | Likely language for documents (e.g. `English`). Default: `English`. | No |
@ -455,7 +460,7 @@ When using local LLMs (like those through Ollama), you might need to adjust cert
#### Token Management #### Token Management
- Use `TOKEN_LIMIT` environment variable to control the maximum number of tokens sent to the LLM - Use `TOKEN_LIMIT` environment variable to control the maximum number of tokens sent to the LLM
- Smaller models might truncate content unexpectedly if given too much text - Smaller models might truncate content unexpectedly if given too much text
- Start with a conservative limit (e.g., 2000 tokens) and adjust based on your model's capabilities - Start with a conservative limit (e.g., 1000 tokens) and adjust based on your model's capabilities
- Set to `0` to disable the limit (use with caution) - Set to `0` to disable the limit (use with caution)
Example configuration for smaller models: Example configuration for smaller models:
@ -463,7 +468,7 @@ Example configuration for smaller models:
environment: environment:
TOKEN_LIMIT: '2000' # Adjust based on your model's context window TOKEN_LIMIT: '2000' # Adjust based on your model's context window
LLM_PROVIDER: 'ollama' LLM_PROVIDER: 'ollama'
LLM_MODEL: 'llama2' # Or other local model LLM_MODEL: 'deepseek-r1:8b' # Or other local model
``` ```
Common issues and solutions: Common issues and solutions: