From 712ed53c1ca5c585f8f43cb432fc3a2d0ce38efc Mon Sep 17 00:00:00 2001 From: Icereed Date: Fri, 7 Feb 2025 08:29:42 +0100 Subject: [PATCH] Update README: Revise features and env vars for DeepSeek integration (#197) --- README.md | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index b1a1692..1177875 100644 --- a/README.md +++ b/README.md @@ -20,21 +20,24 @@ https://github.com/user-attachments/assets/bd5d38b9-9309-40b9-93ca-918dfa4f3fd4 2. **Automatic Title & Tag Generation** No more guesswork. Let the AI do the naming and categorizing. You can easily review suggestions and refine them if needed. -3. **Automatic Correspondent Generation** +3. **Supports DeepSeek reasoning models in Ollama** + Greatly enhance accuracy by using a reasoning model like `deepseek-r1:8b`. The perfect tradeoff between privacy and performance! Of course, if you got enough GPUs or NPUs, a bigger model will enhance the experience. + +5. **Automatic Correspondent Generation** Automatically identify and generate correspondents from your documents, making it easier to track and organize your communications. -4. **Extensive Customization** +6. **Extensive Customization** - **Prompt Templates**: Tweak your AI prompts to reflect your domain, style, or preference. - **Tagging**: Decide how documents get tagged—manually, automatically, or via OCR-based flows. -5. **Simple Docker Deployment** +7. **Simple Docker Deployment** A few environment variables, and you’re off! Compose it alongside paperless-ngx with minimal fuss. -6. **Unified Web UI** +8. **Unified Web UI** - **Manual Review**: Approve or tweak AI’s suggestions. - **Auto Processing**: Focus only on edge cases while the rest is sorted for you. -7. **Opt-In LLM-based OCR** +9. **Opt-In LLM-based OCR** If you opt in, your images get read by a Vision LLM, pushing boundaries beyond standard OCR tools. --- @@ -69,7 +72,7 @@ https://github.com/user-attachments/assets/bd5d38b9-9309-40b9-93ca-918dfa4f3fd4 - A running instance of [paperless-ngx][paperless-ngx]. - Access to an LLM provider: - **OpenAI**: An API key with models like `gpt-4o` or `gpt-3.5-turbo`. - - **Ollama**: A running Ollama server with models like `llama2`. + - **Ollama**: A running Ollama server with models like `deepseek-r1:8b`. ### Installation @@ -93,7 +96,9 @@ services: MANUAL_TAG: 'paperless-gpt' # Optional, default: paperless-gpt AUTO_TAG: 'paperless-gpt-auto' # Optional, default: paperless-gpt-auto LLM_PROVIDER: 'openai' # or 'ollama' - LLM_MODEL: 'gpt-4o' # or 'llama2' + LLM_MODEL: 'gpt-4o' # or 'deepseek-r1:8b' + # Optional, but recommended for Ollama + TOKEN_LIMIT: 1000 OPENAI_API_KEY: 'your_openai_api_key' # Optional - OPENAI_BASE_URL: 'https://litellm.yourinstallationof.it.com/v1' LLM_LANGUAGE: 'English' # Optional, default: English @@ -160,7 +165,7 @@ services: | `MANUAL_TAG` | Tag for manual processing. Default: `paperless-gpt`. | No | | `AUTO_TAG` | Tag for auto processing. Default: `paperless-gpt-auto`. | No | | `LLM_PROVIDER` | AI backend (`openai` or `ollama`). | Yes | -| `LLM_MODEL` | AI model name, e.g. `gpt-4o`, `gpt-3.5-turbo`, `llama2`. | Yes | +| `LLM_MODEL` | AI model name, e.g. `gpt-4o`, `gpt-3.5-turbo`, `deepseek-r1:8b`. | Yes | | `OPENAI_API_KEY` | OpenAI API key (required if using OpenAI). | Cond. | | `OPENAI_BASE_URL` | OpenAI base URL (optional, if using a custom OpenAI compatible service like LiteLLM). | No | | `LLM_LANGUAGE` | Likely language for documents (e.g. `English`). Default: `English`. | No | @@ -455,7 +460,7 @@ When using local LLMs (like those through Ollama), you might need to adjust cert #### Token Management - Use `TOKEN_LIMIT` environment variable to control the maximum number of tokens sent to the LLM - Smaller models might truncate content unexpectedly if given too much text -- Start with a conservative limit (e.g., 2000 tokens) and adjust based on your model's capabilities +- Start with a conservative limit (e.g., 1000 tokens) and adjust based on your model's capabilities - Set to `0` to disable the limit (use with caution) Example configuration for smaller models: @@ -463,7 +468,7 @@ Example configuration for smaller models: environment: TOKEN_LIMIT: '2000' # Adjust based on your model's context window LLM_PROVIDER: 'ollama' - LLM_MODEL: 'llama2' # Or other local model + LLM_MODEL: 'deepseek-r1:8b' # Or other local model ``` Common issues and solutions: