This commit is contained in:
Icereed 2025-02-03 09:28:32 +01:00 committed by GitHub
commit 8dc489e035
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
13 changed files with 1724 additions and 0 deletions

116
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View file

@ -0,0 +1,116 @@
name: Bug Report
description: Create a report to help us improve
title: "[BUG] "
labels: ["bug", "triage"]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to fill out this bug report!
Before submitting, please check if a similar issue already exists.
- type: input
id: version
attributes:
label: Version
description: What version of paperless-gpt are you running?
placeholder: "e.g., 1.0.0"
validations:
required: true
- type: dropdown
id: deployment
attributes:
label: Deployment Method
description: How are you running paperless-gpt?
options:
- Docker (official image)
- Docker Compose
- Manual Installation
- Other
validations:
required: true
- type: input
id: llm-provider
attributes:
label: LLM Provider
description: Which LLM provider are you using?
placeholder: "e.g., OpenAI, Ollama"
validations:
required: true
- type: input
id: llm-model
attributes:
label: LLM Model
description: Which model are you using?
placeholder: "e.g., gpt-4, llama2"
validations:
required: true
- type: dropdown
id: os
attributes:
label: Operating System
description: What operating system are you using?
options:
- Linux
- macOS
- Windows
- Other
validations:
required: true
- type: textarea
id: what-happened
attributes:
label: What happened?
description: A clear and concise description of the bug.
placeholder: "Tell us what you see!"
validations:
required: true
- type: textarea
id: expected
attributes:
label: Expected behavior
description: What did you expect to happen?
placeholder: "Tell us what you expected"
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Steps to reproduce
description: How can we reproduce this issue?
placeholder: |
1. Go to '...'
2. Click on '...'
3. Scroll down to '...'
4. See error
validations:
required: true
- type: textarea
id: logs
attributes:
label: Relevant log output
description: Please copy and paste any relevant log output. This will be automatically formatted into code.
render: shell
- type: textarea
id: config
attributes:
label: Configuration
description: |
Please provide your configuration (with sensitive information redacted).
This could be your docker-compose.yml or environment variables.
render: yaml
- type: textarea
id: additional
attributes:
label: Additional context
description: Add any other context about the problem here

View file

@ -0,0 +1,118 @@
name: Feature Request
description: Suggest an idea for this project
title: "[FEATURE] "
labels: ["enhancement"]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to suggest a new feature!
Please fill out this form as completely as possible.
- type: textarea
id: problem
attributes:
label: Problem Statement
description: Is your feature request related to a problem? Please describe.
placeholder: "I'm always frustrated when [...]"
validations:
required: true
- type: textarea
id: solution
attributes:
label: Proposed Solution
description: Describe the solution you'd like to see
placeholder: "It would be great if [...]"
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternatives Considered
description: Describe any alternative solutions or features you've considered
placeholder: "I've thought about [...]"
- type: dropdown
id: importance
attributes:
label: Importance Level
description: How important is this feature to your use case?
options:
- Critical (Blocking my use of the project)
- High (Would significantly improve my workflow)
- Medium (Would be nice to have)
- Low (Just an idea)
validations:
required: true
- type: dropdown
id: component
attributes:
label: Component
description: Which part of paperless-gpt would this feature primarily affect?
options:
- OCR Processing
- LLM Integration
- Document Management
- UI/UX
- API
- Configuration
- Documentation
- Performance
- Security
- Other
validations:
required: true
- type: dropdown
id: scope
attributes:
label: Implementation Scope
description: How extensive would the changes be?
options:
- Minor (Simple change, few files)
- Moderate (Multiple files, some complexity)
- Major (Significant changes, new features)
- Breaking (Requires breaking changes)
validations:
required: true
- type: textarea
id: context
attributes:
label: Additional Context
description: Add any other context about the feature request here
placeholder: "Include use cases, benefits, or screenshots"
- type: textarea
id: implementation
attributes:
label: Implementation Ideas
description: If you have specific ideas about how to implement this feature, please share them
placeholder: "We could implement this by..."
- type: checkboxes
id: terms
attributes:
label: Contribution
description: Would you be interested in helping implement this feature?
options:
- label: I'm interested in contributing to this feature's implementation
required: false
- label: I have read the contribution guidelines
required: true
- type: textarea
id: success_criteria
attributes:
label: Success Criteria
description: What would make this feature implementation successful?
placeholder: |
Example criteria:
- Feature works with both OpenAI and Ollama
- Performance impact is minimal
- No breaking changes to existing functionality
validations:
required: true

163
.github/config.yml vendored Normal file
View file

@ -0,0 +1,163 @@
# GitHub App Configuration
# Label Configuration
labels:
# Type labels
- name: bug
color: d73a4a
description: Something isn't working
- name: enhancement
color: a2eeef
description: New feature or request
- name: documentation
color: 0075ca
description: Documentation improvements
- name: security
color: ee0701
description: Security-related issues
# Priority labels
- name: critical
color: b60205
description: Needs immediate attention
- name: high
color: d93f0b
description: High priority
- name: medium
color: fbca04
description: Medium priority
- name: low
color: 0e8a16
description: Low priority
# Status labels
- name: triage
color: d4c5f9
description: Needs triage
- name: in-progress
color: 9ee12f
description: Work in progress
- name: blocked
color: b60205
description: Blocked or needs clarification
# Component labels
- name: frontend
color: 1d76db
description: Frontend related
- name: backend
color: 0052cc
description: Backend related
- name: ocr
color: 5319e7
description: OCR functionality
- name: llm
color: 006b75
description: LLM integration
# Size labels
- name: size/xs
color: d4c5f9
description: Extra small change
- name: size/s
color: 84b6eb
description: Small change
- name: size/m
color: fbca04
description: Medium change
- name: size/l
color: d93f0b
description: Large change
- name: size/xl
color: b60205
description: Extra large change
# Stale issue configuration
stale:
daysUntilStale: 60
daysUntilClose: 7
exemptLabels:
- security
- critical
- pinned
staleLabel: stale
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
closeComment: >
This issue has been automatically closed due to inactivity. Please feel free
to reopen it if you still experience this problem.
# Welcome message for new contributors
newContributorWelcomeComment: >
Thanks for making your first contribution to paperless-gpt! 🎉
Please make sure you've read our [Contributing Guidelines](CONTRIBUTING.md)
and [Code of Conduct](CODE_OF_CONDUCT.md).
If you need any help, feel free to mention @icereed or ask in our Discord.
# PR size labeling
prSize:
xs:
lines: 10
s:
lines: 50
m:
lines: 250
l:
lines: 500
xl:
lines: 1000
# Code review settings
reviews:
request_count: 1
notify_on_changes: true
auto_assign: true
auto_merge: false
# Branch protection settings
branchProtection:
main:
required_status_checks:
- "build"
- "test"
- "lint"
enforce_admins: true
required_pull_request_reviews:
required_approving_review_count: 1
dismiss_stale_reviews: true
require_code_owner_reviews: true
allow_force_pushes: false
allow_deletions: false
# Issue template settings
issueTemplate:
checkNew: true
useConfigure: true
configureMessage: >
Please use our issue templates to report bugs or request features.
This helps us track and resolve issues more effectively.
# Pull request template settings
pullRequestTemplate:
checkNew: true
useConfigure: true
configureMessage: >
Please make sure your PR follows our guidelines and includes all necessary information.
Don't forget to link any related issues.
# Repository settings
repository:
private: false
has_issues: true
has_projects: true
has_wiki: true
has_downloads: true
default_branch: main
allow_squash_merge: true
allow_merge_commit: false
allow_rebase_merge: true
delete_branch_on_merge: true

79
.github/pull_request_template.md vendored Normal file
View file

@ -0,0 +1,79 @@
# Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
## Type of change
Please delete options that are not relevant.
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update
- [ ] This change requires a documentation update
## Checklist:
Before submitting your PR, please review the following checklist:
### General
- [ ] I have performed a self-review of my code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] Any dependent changes have been merged and published
- [ ] I have checked my code and corrected any misspellings
### Testing
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] I have tested this code in development environment
- [ ] I have tested edge cases and error conditions
### Security
- [ ] My code follows the project's security guidelines
- [ ] I have conducted a security impact assessment of my changes
- [ ] I have verified no sensitive information is exposed
### Performance
- [ ] I have verified my changes don't introduce performance regressions
- [ ] I have optimized any resource-intensive operations
- [ ] I have considered the impact on system resources
### Documentation
- [ ] I have updated the README.md (if applicable)
- [ ] I have updated the API documentation (if applicable)
- [ ] I have updated architecture docs (if applicable)
- [ ] I have added JSDoc/comments for all new code
### Dependencies
- [ ] I have updated the dependency list (if applicable)
- [ ] I have checked for and resolved any dependency conflicts
- [ ] I have verified compatibility with existing dependencies
### Compatibility
- [ ] My changes are backward compatible
- [ ] I have tested with different LLM providers
- [ ] I have tested with different configurations
- [ ] I have verified Docker compatibility
### Code Quality
- [ ] My code follows the project's style guidelines
- [ ] I have run linting tools and fixed any issues
- [ ] I have maintained or improved code coverage
- [ ] I have followed SOLID principles
## Screenshots/Videos
If applicable, add screenshots or videos to help explain your changes.
## Additional Notes
Add any other context about the PR here.
## Linked Issues
- Resolves #(issue number)
- Related to #(issue number)

217
.github/workflows/code-quality.yml vendored Normal file
View file

@ -0,0 +1,217 @@
name: Code Quality
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
permissions:
contents: read
pull-requests: write
jobs:
lint:
name: Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Install golangci-lint
run: |
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.55.2
- name: Go Lint
uses: golangci/golangci-lint-action@v4
with:
version: latest
args: --timeout=5m
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: './web-app/package-lock.json'
- name: Install frontend dependencies
run: npm ci
working-directory: ./web-app
- name: Frontend Lint
run: npm run lint
working-directory: ./web-app
type-check:
name: Type Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Go Type Check
run: go vet ./...
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: './web-app/package-lock.json'
- name: Install frontend dependencies
run: npm ci
working-directory: ./web-app
- name: TypeScript Check
run: npm run type-check
working-directory: ./web-app
security:
name: Security Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Gosec Security Scanner
uses: securego/gosec@master
with:
args: './...'
- name: Run npm audit
run: npm audit
working-directory: ./web-app
- name: Run Snyk to check for vulnerabilities
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high --all-projects
coverage:
name: Code Coverage
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Install mupdf
run: sudo apt-get install -y mupdf
- name: Set library path
run: echo "/usr/lib" | sudo tee -a /etc/ld.so.conf.d/mupdf.conf && sudo ldconfig
- name: Run Go Coverage
run: |
go test -race -coverprofile=coverage.txt -covermode=atomic ./...
go tool cover -func=coverage.txt
- name: Upload Go coverage to Codecov
uses: codecov/codecov-action@v4
with:
file: ./coverage.txt
flags: backend
fail_ci_if_error: true
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: './web-app/package-lock.json'
- name: Install frontend dependencies
run: npm ci
working-directory: ./web-app
- name: Run Frontend Coverage
run: npm run test:coverage
working-directory: ./web-app
- name: Upload Frontend coverage to Codecov
uses: codecov/codecov-action@v4
with:
file: ./web-app/coverage/coverage-final.json
flags: frontend
fail_ci_if_error: true
format:
name: Code Formatting
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Check Go Formatting
run: |
if [ -n "$(gofmt -l .)" ]; then
echo "Go files need formatting:"
gofmt -d .
exit 1
fi
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: './web-app/package-lock.json'
- name: Install frontend dependencies
run: npm ci
working-directory: ./web-app
- name: Check Frontend Formatting
run: npm run format:check
working-directory: ./web-app
complexity:
name: Code Complexity
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Install gocyclo
run: go install github.com/fzipp/gocyclo/cmd/gocyclo@latest
- name: Check Go Code Complexity
run: |
gocyclo -over 15 .
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: './web-app/package-lock.json'
- name: Install frontend dependencies
run: npm ci
working-directory: ./web-app
- name: Check Frontend Complexity
run: npx ts-complexity ./src --max-complexity 15
working-directory: ./web-app

193
.github/workflows/documentation.yml vendored Normal file
View file

@ -0,0 +1,193 @@
name: Documentation
on:
push:
branches: [ main ]
paths:
- '**/*.md'
- 'docs/**'
- '.github/workflows/documentation.yml'
pull_request:
branches: [ main ]
paths:
- '**/*.md'
- 'docs/**'
- '.github/workflows/documentation.yml'
permissions:
contents: read
pages: write
id-token: write
jobs:
markdown-lint:
name: Markdown Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install markdownlint
run: npm install -g markdownlint-cli
- name: Check Markdown files
run: markdownlint '**/*.md' --ignore node_modules
- name: Check for broken links
uses: gaurav-nelson/github-action-markdown-link-check@v1
with:
use-quiet-mode: 'yes'
use-verbose-mode: 'yes'
config-file: '.github/workflows/mlc_config.json'
folder-path: '.'
max-depth: -1
api-documentation:
name: API Documentation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Install swag
run: go install github.com/swaggo/swag/cmd/swag@latest
- name: Generate Swagger Documentation
run: swag init
- name: Check if documentation changed
run: |
if [[ `git status --porcelain` ]]; then
echo "API documentation is out of date. Please run 'swag init' locally and commit the changes."
exit 1
fi
typescript-documentation:
name: TypeScript Documentation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install TypeDoc
run: npm install -g typedoc
- name: Generate TypeScript Documentation
working-directory: ./web-app
run: typedoc --out docs/typescript src/
- name: Check documentation style
working-directory: ./web-app
run: |
if find src -name "*.tsx" -o -name "*.ts" | xargs grep -l "@todo\|FIXME"; then
echo "Found TODO or FIXME comments in the code. Please resolve them before merging."
exit 1
fi
spelling:
name: Documentation Spelling
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check Spelling
uses: streetsidesoftware/cspell-action@v5
with:
files: |
**/*.md
docs/**/*
validate-examples:
name: Validate Code Examples
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm install markdown-code-block-runner
- name: Validate code examples in documentation
run: npx markdown-code-block-runner "**/*.md"
build-wiki:
name: Build Wiki
needs: [markdown-lint, spelling]
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup mdBook
uses: peaceiris/actions-mdbook@v1
with:
mdbook-version: 'latest'
- name: Build documentation
run: |
mdbook build docs/
- name: Setup Pages
uses: actions/configure-pages@v4
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: 'docs/book'
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
check-docs-coverage:
name: Documentation Coverage
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Install doc coverage tool
run: go install github.com/client9/misspell/cmd/misspell@latest
- name: Check public API documentation coverage
run: |
COVERAGE=$(go doc -all ./... | wc -l)
if [ "$COVERAGE" -lt 100 ]; then
echo "Documentation coverage is below threshold"
exit 1
fi
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Check TypeScript documentation coverage
working-directory: ./web-app
run: |
npm install -g typescript
COVERAGE=$(find src -name "*.ts" -o -name "*.tsx" | xargs grep -l "@doc" | wc -l)
if [ "$COVERAGE" -lt 50 ]; then
echo "TypeScript documentation coverage is below threshold"
exit 1
fi

29
.github/workflows/mlc_config.json vendored Normal file
View file

@ -0,0 +1,29 @@
{
"replacementPatterns": [
{
"pattern": "^/",
"replacement": "{{BASEURL}}/"
}
],
"ignorePatterns": [
{
"pattern": "^http://localhost"
},
{
"pattern": "^#"
}
],
"timeout": "20s",
"retryOn429": true,
"retryCount": 5,
"fallbackRetryDelay": "30s",
"aliveStatusCodes": [200, 206],
"httpHeaders": [
{
"urls": ["https://github.com/"],
"headers": {
"Accept": "application/vnd.github.v3+json"
}
}
]
}

125
.markdownlint.json Normal file
View file

@ -0,0 +1,125 @@
{
"default": true,
"MD001": true,
"MD002": {
"level": 1
},
"MD003": {
"style": "atx"
},
"MD004": {
"style": "dash"
},
"MD005": true,
"MD006": true,
"MD007": {
"indent": 2
},
"MD009": {
"br_spaces": 2,
"list_item_empty_lines": false,
"strict": false
},
"MD010": {
"code_blocks": false,
"spaces_per_tab": 2
},
"MD011": true,
"MD012": {
"maximum": 1
},
"MD013": {
"line_length": 120,
"code_blocks": false,
"tables": false,
"headings": false
},
"MD014": false,
"MD018": true,
"MD019": true,
"MD020": true,
"MD021": true,
"MD022": true,
"MD023": true,
"MD024": {
"allow_different_nesting": true
},
"MD025": {
"level": 1,
"front_matter_title": ""
},
"MD026": {
"punctuation": ".,;:!。,;:!"
},
"MD027": true,
"MD028": true,
"MD029": {
"style": "ordered"
},
"MD030": {
"ul_single": 1,
"ol_single": 1,
"ul_multi": 1,
"ol_multi": 1
},
"MD031": true,
"MD032": true,
"MD033": {
"allowed_elements": [
"br",
"details",
"summary",
"kbd",
"div",
"img",
"pre"
]
},
"MD034": true,
"MD035": {
"style": "---"
},
"MD036": false,
"MD037": true,
"MD038": true,
"MD039": true,
"MD040": true,
"MD041": {
"level": 1,
"front_matter_title": ""
},
"MD042": true,
"MD043": false,
"MD044": {
"names": [
"JavaScript",
"TypeScript",
"React",
"Docker",
"Node.js",
"npm",
"Go",
"OpenAI",
"Ollama",
"paperless-gpt"
],
"code_blocks": false
},
"MD045": true,
"MD046": {
"style": "fenced"
},
"MD047": true,
"MD048": {
"style": "backtick"
},
"MD049": {
"style": "underscore"
},
"MD050": {
"style": "asterisk"
},
"MD051": true,
"MD052": true,
"MD053": true
}

34
CHANGELOG.md Normal file
View file

@ -0,0 +1,34 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Enhanced project documentation and organization
- Project governance guidelines
- Security policy and guidelines
- Architecture documentation
## [1.0.0] - Initial Release
### Added
- LLM-Enhanced OCR capabilities
- Automatic title & tag generation
- Automatic correspondent generation
- Custom prompt templates
- Docker deployment support
- Web UI for document management
- Support for multiple LLM providers (OpenAI, Ollama)
- Configurable environment variables
- Integration with paperless-ngx
- Manual and automatic processing modes
- Basic documentation and setup guides
### Security
- API token authentication
- Environment-based configuration
- Docker container isolation
For earlier history, please see the git commit log.

226
GOVERNANCE.md Normal file
View file

@ -0,0 +1,226 @@
# Project Governance
This document outlines the governance model for the paperless-gpt project. It describes how decisions are made and how community members can participate in project development.
## Project Roles
### Users
- People who use paperless-gpt
- Can submit bug reports and feature requests
- Can contribute to discussions
- Can help other users
### Contributors
- Users who contribute to the project
- Submit pull requests
- Improve documentation
- Help with testing
- Participate in issue discussions
### Maintainers
- Review and merge pull requests
- Manage issues and project boards
- Guide technical direction
- Ensure code quality
- Help onboard new contributors
- Responsibilities:
- Respond to issues and PRs
- Review code changes
- Maintain documentation
- Ensure tests pass
- Release new versions
- Uphold code of conduct
### Project Lead
- Final decision maker for project direction
- Sets technical standards
- Manages maintainer team
- Oversees releases
- Current lead: [@icereed](https://github.com/icereed)
## Decision Making
### Technical Decisions
1. **Discussion Phase**
- Open an issue for discussion
- Gather community feedback
- Consider alternatives
- Document trade-offs
2. **Implementation Phase**
- Create detailed proposal
- Submit pull request
- Address review feedback
- Update documentation
3. **Review Process**
- At least one maintainer review
- Automated tests must pass
- Documentation must be updated
- Breaking changes require extra scrutiny
### Project Direction
1. **Long-term Planning**
- Quarterly roadmap updates
- Community feedback periods
- Clear communication of goals
- Published milestones
2. **Feature Acceptance**
- Must align with project goals
- Consider maintenance burden
- Evaluate user benefit
- Check implementation feasibility
### Release Process
1. **Version Planning**
- Follow semantic versioning
- Document all changes
- Update dependencies
- Security review
2. **Release Preparation**
- Create release branch
- Run test suite
- Update changelog
- Draft release notes
3. **Release Publication**
- Tag version in repository
- Publish to registries
- Announce to community
- Monitor for issues
## Communication
### Channels
- GitHub Issues: Bug reports, feature requests
- GitHub Discussions: General discussion
- Pull Requests: Code changes
- Discord: Community chat
- Email: Security issues
### Guidelines
- Be respectful and professional
- Stay on topic
- English is the working language
- Document decisions and rationale
- Keep security issues private
## Contributing
### Process
1. **Getting Started**
- Read contribution guidelines
- Set up development environment
- Understand code structure
- Pick starter issues
2. **Making Changes**
- Create feature branch
- Follow code style
- Write tests
- Update docs
3. **Submitting Changes**
- Create pull request
- Fill out template
- Respond to reviews
- Keep changes focused
### Standards
- Follow code style guide
- Include tests
- Update documentation
- Sign commits
- One feature per PR
## Code Review
### Requirements
- At least one maintainer approval
- All tests passing
- Documentation updated
- Code style compliance
- No security issues
### Process
1. **Automated Checks**
- Linting
- Tests
- Coverage
- Dependencies
2. **Manual Review**
- Code quality
- Architecture
- Security
- Performance
3. **Final Checks**
- Merge conflicts
- Documentation
- Breaking changes
- Version updates
## Issue Management
### Categories
- Bug: Software defects
- Feature: New functionality
- Enhancement: Improvements
- Documentation: Doc changes
- Question: User queries
### Priority Levels
1. **Critical**
- Security issues
- Major bugs
- Blocking issues
2. **High**
- Important features
- User experience issues
- Performance problems
3. **Normal**
- Regular enhancements
- Minor bugs
- Documentation updates
4. **Low**
- Nice-to-have features
- Style improvements
- Non-critical fixes
## Project Changes
### Governance Changes
- Open for community discussion
- Two week comment period
- Maintainer consensus required
- Project lead approval needed
### Role Changes
- Based on consistent contributions
- Maintainer nomination
- Community feedback
- Project lead approval
## Success Metrics
### Project Health
- Issue resolution time
- PR merge time
- Test coverage
- Documentation quality
- Community engagement
### Code Quality
- Automated metrics
- Review thoroughness
- Test coverage
- Documentation completeness
- Security standards
This governance model is a living document and may be updated as the project evolves. Changes will be proposed and discussed with the community before implementation.

125
SECURITY.md Normal file
View file

@ -0,0 +1,125 @@
# Security Policy
## Reporting a Vulnerability
At paperless-gpt, we take security seriously. If you discover a security vulnerability, please follow these steps:
1. **DO NOT** disclose the vulnerability publicly.
2. Send a detailed report to security@icereed.net including:
- A description of the vulnerability
- Steps to reproduce the issue
- Potential impact
- Any suggested fixes (if available)
3. Allow up to 48 hours for an initial response.
4. Please do not disclose the issue publicly until we've had a chance to address it.
## Security Considerations
### API Keys and Tokens
- Never commit API keys, tokens, or sensitive credentials to the repository
- Use environment variables for all sensitive configuration
- Rotate API keys and tokens regularly
- Use the minimum required permissions for API tokens
### Data Privacy
- All document processing is done locally or via your configured LLM provider
- No document data is stored permanently outside your system
- Temporary files are cleaned up after processing
- Documents are transmitted securely using HTTPS
### Docker Security
- Containers run with minimal privileges
- Images are regularly updated with security patches
- Dependencies are scanned for vulnerabilities
- Official base images are used
### LLM Provider Security
- API calls to LLM providers use encrypted connections
- Rate limiting is implemented to prevent abuse
- Input validation is performed on all user inputs
- Error messages are sanitized to prevent information leakage
### Access Control
- Use strong passwords for all services
- Implement the principle of least privilege
- Regular security audits of access controls
- Monitor for unauthorized access attempts
## Version Support
We provide security updates for:
- The current major version
- The previous major version for 6 months after a new major release
## Best Practices for Deployment
1. **Network Security**
- Use HTTPS for all connections
- Implement proper firewall rules
- Use secure DNS configurations
- Regular security audits
2. **System Updates**
- Keep all system packages updated
- Subscribe to security advisories
- Regular vulnerability scanning
- Automated update notifications
3. **Monitoring**
- Monitor system logs for suspicious activity
- Track resource usage patterns
- Alert on anomalous behavior
- Regular security assessments
4. **Backup and Recovery**
- Regular backups of critical data
- Secure backup storage
- Tested recovery procedures
- Documented incident response plan
## Dependencies
We regularly monitor and update dependencies for security vulnerabilities:
- Automated dependency updates via Renovate
- Regular security audits of dependencies
- Minimal use of third-party packages
- Verification of package signatures
## Contributing Security Fixes
If you want to contribute security fixes:
1. Follow the standard pull request process
2. Mark security-related PRs as "security fix"
3. Provide detailed description of the security impact
4. Include tests that verify the fix
## Security Release Process
When a security issue is identified:
1. Issue is assessed and prioritized
2. Fix is developed and tested
3. Security advisory is prepared
4. Fix is deployed and announced
5. Users are notified through appropriate channels
## Incident Response
In case of a security incident:
1. Issue is immediately assessed
2. Affected systems are isolated
3. Root cause is identified
4. Fix is developed and tested
5. Systems are restored
6. Incident report is prepared
7. Preventive measures are implemented
## Contact
For security-related matters, contact:
- Email: security@icereed.net
- Response time: Within 48 hours
- Language: English
## Acknowledgments
We'd like to thank all security researchers who have helped improve the security of paperless-gpt. A list of acknowledged researchers can be found in our [Hall of Fame](CONTRIBUTORS.md#security-researchers).

View file

@ -0,0 +1,78 @@
# Product Context
## Project Purpose
paperless-gpt is designed to enhance document management by integrating AI capabilities with paperless-ngx. Its primary purpose is to automate and improve the accuracy of document processing tasks that traditionally require manual intervention.
## Problems Solved
1. Manual Document Organization
- Eliminates tedious manual tagging and titling
- Reduces time spent on document categorization
- Minimizes human error in classification
2. OCR Quality Issues
- Improves text extraction from poor quality scans
- Enhances accuracy through LLM-based OCR
- Provides context-aware text interpretation
3. Document Processing Automation
- Automates correspondent identification
- Streamlines document categorization
- Enables bulk processing capabilities
## Core Functionality
1. AI-Powered Document Processing
- Title generation using LLMs
- Intelligent tag suggestions
- Automated correspondent detection
- Enhanced OCR capabilities
2. Integration Features
- Seamless paperless-ngx integration
- Support for multiple LLM providers
- Docker-based deployment
- Customizable prompt templates
3. User Experience
- Web-based interface
- Manual review capabilities
- Automatic processing options
- Flexible configuration options
## Success Criteria
1. Accuracy Metrics
- High-quality OCR results
- Accurate document classification
- Relevant tag suggestions
- Correct correspondent identification
2. Performance Goals
- Fast processing times
- Reliable system operation
- Scalable document handling
- Efficient resource usage
3. User Satisfaction
- Intuitive interface
- Clear feedback mechanisms
- Minimal manual intervention
- Consistent results
## Future Vision
1. Enhanced Capabilities
- Support for more AI providers
- Statistics and analytics features
- Advanced document analysis
- Improved processing algorithms
- Extended automation options
2. Community Growth
- Active contributor base
- Regular feature additions
- Strong documentation
- Responsive maintenance
3. Technical Evolution
- Improved architecture
- Enhanced performance
- Extended integrations
- Robust testing

221
docs/ARCHITECTURE.md Normal file
View file

@ -0,0 +1,221 @@
# paperless-gpt Architecture
This document provides a comprehensive overview of the paperless-gpt architecture, explaining how different components interact to provide AI-powered document processing capabilities.
## System Overview
paperless-gpt is designed as a companion service to paperless-ngx, adding AI capabilities for document processing. The system consists of several key components:
```mermaid
graph TB
UI[Web UI] --> API[Backend API]
API --> LLM[LLM Service]
API --> OCR[OCR Service]
API --> DB[Local DB]
API --> PaperlessNGX[paperless-ngx API]
LLM --> OpenAI[OpenAI]
LLM --> Ollama[Ollama]
OCR --> VisionLLM[Vision LLM]
```
## Core Components
### 1. Backend API (Go)
- Handles all business logic
- Manages document processing workflow
- Coordinates between services
- Provides REST API endpoints
- Manages state and caching
### 2. Web UI (React + TypeScript)
- User interface for document management
- Real-time processing status
- Document preview and editing
- Configuration interface
- Responsive design
### 3. LLM Service
- Manages LLM provider connections
- Handles prompt engineering
- Processes document content
- Generates metadata suggestions
- Supports multiple providers:
- OpenAI (gpt-4, gpt-3.5-turbo)
- Ollama (llama2, etc.)
### 4. OCR Service
- Vision LLM integration
- Image preprocessing
- Text extraction
- Layout analysis
- Quality enhancement
### 5. Local Database
- Caches processing results
- Stores configuration
- Manages queues
- Tracks document state
## Data Flow
### Document Processing Flow
```mermaid
sequenceDiagram
participant U as User
participant UI as Web UI
participant API as Backend API
participant LLM as LLM Service
participant OCR as OCR Service
participant PNX as paperless-ngx
U->>UI: Upload Document
UI->>API: Process Request
API->>OCR: Extract Text
OCR-->>API: Text Content
API->>LLM: Generate Metadata
LLM-->>API: Suggestions
API->>UI: Preview Results
U->>UI: Approve Changes
UI->>API: Apply Changes
API->>PNX: Update Document
PNX-->>API: Confirmation
API-->>UI: Success
```
## Key Design Decisions
### 1. Modular Architecture
- Separation of concerns
- Pluggable components
- Easy to extend
- Maintainable code
### 2. Stateless Design
- Scalable architecture
- No shared state
- Resilient operation
- Easy deployment
### 3. Security First
- API authentication
- Data encryption
- Input validation
- Error handling
### 4. Performance Optimization
- Local caching
- Batch processing
- Async operations
- Resource management
## Directory Structure
```
paperless-gpt/
├── main.go # Application entry point
├── app_llm.go # LLM service implementation
├── app_http_handlers.go # HTTP handlers
├── paperless.go # paperless-ngx integration
├── ocr.go # OCR service
├── types.go # Type definitions
├── web-app/ # Frontend application
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── App.tsx # Main application
│ │ └── ...
│ └── ...
└── ...
```
## Configuration Management
The system uses environment variables for configuration, allowing easy deployment and configuration changes:
```
PAPERLESS_BASE_URL # paperless-ngx connection
LLM_PROVIDER # AI backend selection
VISION_LLM_PROVIDER # OCR provider selection
...
```
## Error Handling
The system implements comprehensive error handling:
1. **User Errors**
- Input validation
- Clear error messages
- Guided resolution
2. **System Errors**
- Graceful degradation
- Automatic retry
- Error logging
- Monitoring alerts
3. **External Service Errors**
- Fallback options
- Circuit breaking
- Rate limiting
- Error reporting
## Scaling Considerations
The architecture supports scaling through:
1. **Horizontal Scaling**
- Stateless design
- Load balancing
- Distributed processing
2. **Resource Management**
- Connection pooling
- Cache management
- Queue processing
- Rate limiting
3. **Performance Optimization**
- Batch processing
- Async operations
- Efficient algorithms
- Resource caching
## Future Considerations
The architecture is designed to support future enhancements:
1. **Plugin System**
- Custom processors
- Integration points
- Event hooks
2. **Advanced Features**
- Multi-language support
- Custom ML models
- Advanced analytics
3. **Integration Options**
- API extensions
- Service hooks
- Custom providers
## Development Guidelines
When making changes to the architecture:
1. **Documentation**
- Update this document
- Add inline comments
- Update API docs
2. **Testing**
- Unit tests
- Integration tests
- Performance tests
3. **Review Process**
- Architecture review
- Security review
- Performance review
This architecture documentation is maintained by the core team and updated as the system evolves.