**paperless-gpt** is a tool designed to generate accurate and meaningful document titles and tags for [paperless-ngx](https://github.com/paperless-ngx/paperless-ngx) using Large Language Models (LLMs). It supports multiple LLM providers, including **OpenAI** and **Ollama**. With paperless-gpt, you can streamline your document management by automatically suggesting appropriate titles and tags based on the content of your scanned documents.
- [Docker](https://www.docker.com/get-started) installed on your system.
- A running instance of [paperless-ngx](https://github.com/paperless-ngx/paperless-ngx).
- Access to an LLM provider:
- **OpenAI**: An API key with access to models like `gpt-4o` or `gpt-3.5-turbo`.
- **Ollama**: A running Ollama server with models like `llama2` installed.
### Installation
#### Docker Compose
The easiest way to get started is by using Docker Compose. Below is an example `docker-compose.yml` file to set up paperless-gpt alongside paperless-ngx.
| `PAPERLESS_BASE_URL` | The base URL of your paperless-ngx instance (e.g., `http://paperless-ngx:8000`). | Yes |
| `PAPERLESS_API_TOKEN` | API token for accessing paperless-ngx. You can generate one in the paperless-ngx admin interface. | Yes |
| `LLM_PROVIDER` | The LLM provider to use (`openai` or `ollama`). | Yes |
| `LLM_MODEL` | The model name to use (e.g., `gpt-4o`, `gpt-3.5-turbo`, `llama2`). | Yes |
| `OPENAI_API_KEY` | Your OpenAI API key. Required if using OpenAI as the LLM provider. | Cond. |
| `LLM_LANGUAGE` | The likely language of your documents (e.g., `English`, `German`). Default is `English`. | No |
| `OLLAMA_HOST` | The URL of the Ollama server (e.g., `http://host.docker.internal:11434`). Useful if using Ollama. Default is `http://127.0.0.1:11434`. | No |
You can customize the prompt templates used by paperless-gpt to generate titles and tags. By default, the application uses built-in templates, but you can modify them by editing the template files.
#### Prompt Templates Directory
The prompt templates are stored in the `prompts` directory inside the application. The two main template files are:
-`title_prompt.tmpl`: Template used for generating document titles.
-`tag_prompt.tmpl`: Template used for generating document tags.
#### Mounting the Prompts Directory
To modify the prompt templates, you need to mount a local `prompts` directory into the container.
**Docker Compose Example:**
```yaml
services:
paperless-gpt:
image: icereed/paperless-gpt:latest
# ... (other configurations)
volumes:
- ./prompts:/app/prompts # Mount the prompts directory
```
**Docker Run Command Example:**
```bash
docker run -d \
# ... (other configurations)
-v $(pwd)/prompts:/app/prompts \
paperless-gpt
```
#### Editing the Prompt Templates
1.**Start the Container:**
When you first start the container with the `prompts` directory mounted, it will automatically create the default template files in your local `prompts` directory if they do not exist.
2.**Edit the Template Files:**
- Open `prompts/title_prompt.tmpl` and `prompts/tag_prompt.tmpl` with your favorite text editor.
- Modify the templates using Go's `text/template` syntax.
- Save the changes.
3.**Restart the Container (if necessary):**
The application automatically reloads the templates when it starts. If the container is already running, you may need to restart it to apply the changes.
#### Template Syntax and Variables
The templates use Go's `text/template` syntax and have access to the following variables:
- **For `title_prompt.tmpl`:**
-`{{.Language}}`: The language specified in `LLM_LANGUAGE` (default is `English`).
-`{{.Content}}`: The content of the document.
- **For `tag_prompt.tmpl`:**
-`{{.Language}}`: The language specified in `LLM_LANGUAGE`.
-`{{.AvailableTags}}`: A list (array) of available tags from paperless-ngx.
-`{{.Title}}`: The suggested title for the document.
-`{{.Content}}`: The content of the document.
**Example `title_prompt.tmpl`:**
```text
I will provide you with the content of a document that has been partially read by OCR (so it may contain errors).
Your task is to find a suitable document title that I can use as the title in the paperless-ngx program.
Respond only with the title, without any additional information. The content is likely in {{.Language}}.
Be sure to add one fitting emoji at the beginning of the title to make it more visually appealing.
Content:
{{.Content}}
```
**Example `tag_prompt.tmpl`:**
```text
I will provide you with the content and the title of a document. Your task is to select appropriate tags for the document from the list of available tags I will provide. Only select tags from the provided list. Respond only with the selected tags as a comma-separated list, without any additional information. The content is likely in {{.Language}}.
Available Tags:
{{.AvailableTags | join ","}}
Title:
{{.Title}}
Content:
{{.Content}}
Please concisely select the {{.Language}} tags from the list above that best describe the document.
Be very selective and only choose the most relevant tags since too many tags will make the document less discoverable.
```
**Note:** Advanced users can utilize additional functions from the [Sprig](http://masterminds.github.io/sprig/) template library, as it is included in the application.
- Add the tag `paperless-gpt` to documents you want to process. This tag is configurable via the `tagToFilter` variable in the code (default is `paperless-gpt`).
2.**Access the paperless-gpt Interface:**
- Open your browser and navigate to `http://localhost:8080`.
3.**Process Documents:**
- Click on **"Generate Suggestions"** to let the LLM generate title suggestions based on the document content.