diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml new file mode 100644 index 0000000..ea3b2a1 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -0,0 +1,116 @@ +name: Bug Report +description: Create a report to help us improve +title: "[BUG] " +labels: ["bug", "triage"] +body: + - type: markdown + attributes: + value: | + Thanks for taking the time to fill out this bug report! + Before submitting, please check if a similar issue already exists. + + - type: input + id: version + attributes: + label: Version + description: What version of paperless-gpt are you running? + placeholder: "e.g., 1.0.0" + validations: + required: true + + - type: dropdown + id: deployment + attributes: + label: Deployment Method + description: How are you running paperless-gpt? + options: + - Docker (official image) + - Docker Compose + - Manual Installation + - Other + validations: + required: true + + - type: input + id: llm-provider + attributes: + label: LLM Provider + description: Which LLM provider are you using? + placeholder: "e.g., OpenAI, Ollama" + validations: + required: true + + - type: input + id: llm-model + attributes: + label: LLM Model + description: Which model are you using? + placeholder: "e.g., gpt-4, llama2" + validations: + required: true + + - type: dropdown + id: os + attributes: + label: Operating System + description: What operating system are you using? + options: + - Linux + - macOS + - Windows + - Other + validations: + required: true + + - type: textarea + id: what-happened + attributes: + label: What happened? + description: A clear and concise description of the bug. + placeholder: "Tell us what you see!" + validations: + required: true + + - type: textarea + id: expected + attributes: + label: Expected behavior + description: What did you expect to happen? + placeholder: "Tell us what you expected" + validations: + required: true + + - type: textarea + id: reproduction + attributes: + label: Steps to reproduce + description: How can we reproduce this issue? + placeholder: | + 1. Go to '...' + 2. Click on '...' + 3. Scroll down to '...' + 4. See error + validations: + required: true + + - type: textarea + id: logs + attributes: + label: Relevant log output + description: Please copy and paste any relevant log output. This will be automatically formatted into code. + render: shell + + - type: textarea + id: config + attributes: + label: Configuration + description: | + Please provide your configuration (with sensitive information redacted). + This could be your docker-compose.yml or environment variables. + render: yaml + + - type: textarea + id: additional + attributes: + label: Additional context + description: Add any other context about the problem here diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml new file mode 100644 index 0000000..988b6e9 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.yml @@ -0,0 +1,118 @@ +name: Feature Request +description: Suggest an idea for this project +title: "[FEATURE] " +labels: ["enhancement"] +body: + - type: markdown + attributes: + value: | + Thanks for taking the time to suggest a new feature! + Please fill out this form as completely as possible. + + - type: textarea + id: problem + attributes: + label: Problem Statement + description: Is your feature request related to a problem? Please describe. + placeholder: "I'm always frustrated when [...]" + validations: + required: true + + - type: textarea + id: solution + attributes: + label: Proposed Solution + description: Describe the solution you'd like to see + placeholder: "It would be great if [...]" + validations: + required: true + + - type: textarea + id: alternatives + attributes: + label: Alternatives Considered + description: Describe any alternative solutions or features you've considered + placeholder: "I've thought about [...]" + + - type: dropdown + id: importance + attributes: + label: Importance Level + description: How important is this feature to your use case? + options: + - Critical (Blocking my use of the project) + - High (Would significantly improve my workflow) + - Medium (Would be nice to have) + - Low (Just an idea) + validations: + required: true + + - type: dropdown + id: component + attributes: + label: Component + description: Which part of paperless-gpt would this feature primarily affect? + options: + - OCR Processing + - LLM Integration + - Document Management + - UI/UX + - API + - Configuration + - Documentation + - Performance + - Security + - Other + validations: + required: true + + - type: dropdown + id: scope + attributes: + label: Implementation Scope + description: How extensive would the changes be? + options: + - Minor (Simple change, few files) + - Moderate (Multiple files, some complexity) + - Major (Significant changes, new features) + - Breaking (Requires breaking changes) + validations: + required: true + + - type: textarea + id: context + attributes: + label: Additional Context + description: Add any other context about the feature request here + placeholder: "Include use cases, benefits, or screenshots" + + - type: textarea + id: implementation + attributes: + label: Implementation Ideas + description: If you have specific ideas about how to implement this feature, please share them + placeholder: "We could implement this by..." + + - type: checkboxes + id: terms + attributes: + label: Contribution + description: Would you be interested in helping implement this feature? + options: + - label: I'm interested in contributing to this feature's implementation + required: false + - label: I have read the contribution guidelines + required: true + + - type: textarea + id: success_criteria + attributes: + label: Success Criteria + description: What would make this feature implementation successful? + placeholder: | + Example criteria: + - Feature works with both OpenAI and Ollama + - Performance impact is minimal + - No breaking changes to existing functionality + validations: + required: true diff --git a/.github/config.yml b/.github/config.yml new file mode 100644 index 0000000..f9db328 --- /dev/null +++ b/.github/config.yml @@ -0,0 +1,163 @@ +# GitHub App Configuration + +# Label Configuration +labels: + # Type labels + - name: bug + color: d73a4a + description: Something isn't working + - name: enhancement + color: a2eeef + description: New feature or request + - name: documentation + color: 0075ca + description: Documentation improvements + - name: security + color: ee0701 + description: Security-related issues + + # Priority labels + - name: critical + color: b60205 + description: Needs immediate attention + - name: high + color: d93f0b + description: High priority + - name: medium + color: fbca04 + description: Medium priority + - name: low + color: 0e8a16 + description: Low priority + + # Status labels + - name: triage + color: d4c5f9 + description: Needs triage + - name: in-progress + color: 9ee12f + description: Work in progress + - name: blocked + color: b60205 + description: Blocked or needs clarification + + # Component labels + - name: frontend + color: 1d76db + description: Frontend related + - name: backend + color: 0052cc + description: Backend related + - name: ocr + color: 5319e7 + description: OCR functionality + - name: llm + color: 006b75 + description: LLM integration + + # Size labels + - name: size/xs + color: d4c5f9 + description: Extra small change + - name: size/s + color: 84b6eb + description: Small change + - name: size/m + color: fbca04 + description: Medium change + - name: size/l + color: d93f0b + description: Large change + - name: size/xl + color: b60205 + description: Extra large change + +# Stale issue configuration +stale: + daysUntilStale: 60 + daysUntilClose: 7 + exemptLabels: + - security + - critical + - pinned + staleLabel: stale + markComment: > + This issue has been automatically marked as stale because it has not had + recent activity. It will be closed if no further activity occurs. Thank you + for your contributions. + closeComment: > + This issue has been automatically closed due to inactivity. Please feel free + to reopen it if you still experience this problem. + +# Welcome message for new contributors +newContributorWelcomeComment: > + Thanks for making your first contribution to paperless-gpt! 🎉 + + Please make sure you've read our [Contributing Guidelines](CONTRIBUTING.md) + and [Code of Conduct](CODE_OF_CONDUCT.md). + + If you need any help, feel free to mention @icereed or ask in our Discord. + +# PR size labeling +prSize: + xs: + lines: 10 + s: + lines: 50 + m: + lines: 250 + l: + lines: 500 + xl: + lines: 1000 + +# Code review settings +reviews: + request_count: 1 + notify_on_changes: true + auto_assign: true + auto_merge: false + +# Branch protection settings +branchProtection: + main: + required_status_checks: + - "build" + - "test" + - "lint" + enforce_admins: true + required_pull_request_reviews: + required_approving_review_count: 1 + dismiss_stale_reviews: true + require_code_owner_reviews: true + allow_force_pushes: false + allow_deletions: false + +# Issue template settings +issueTemplate: + checkNew: true + useConfigure: true + configureMessage: > + Please use our issue templates to report bugs or request features. + This helps us track and resolve issues more effectively. + +# Pull request template settings +pullRequestTemplate: + checkNew: true + useConfigure: true + configureMessage: > + Please make sure your PR follows our guidelines and includes all necessary information. + Don't forget to link any related issues. + +# Repository settings +repository: + private: false + has_issues: true + has_projects: true + has_wiki: true + has_downloads: true + default_branch: main + allow_squash_merge: true + allow_merge_commit: false + allow_rebase_merge: true + delete_branch_on_merge: true diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 0000000..2577b74 --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,79 @@ +# Description + +Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. + +Fixes # (issue) + +## Type of change + +Please delete options that are not relevant. + +- [ ] Bug fix (non-breaking change which fixes an issue) +- [ ] New feature (non-breaking change which adds functionality) +- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) +- [ ] Documentation update +- [ ] This change requires a documentation update + +## Checklist: + +Before submitting your PR, please review the following checklist: + +### General +- [ ] I have performed a self-review of my code +- [ ] I have commented my code, particularly in hard-to-understand areas +- [ ] I have made corresponding changes to the documentation +- [ ] My changes generate no new warnings +- [ ] Any dependent changes have been merged and published +- [ ] I have checked my code and corrected any misspellings + +### Testing +- [ ] I have added tests that prove my fix is effective or that my feature works +- [ ] New and existing unit tests pass locally with my changes +- [ ] I have tested this code in development environment +- [ ] I have tested edge cases and error conditions + +### Security +- [ ] My code follows the project's security guidelines +- [ ] I have conducted a security impact assessment of my changes +- [ ] I have verified no sensitive information is exposed + +### Performance +- [ ] I have verified my changes don't introduce performance regressions +- [ ] I have optimized any resource-intensive operations +- [ ] I have considered the impact on system resources + +### Documentation +- [ ] I have updated the README.md (if applicable) +- [ ] I have updated the API documentation (if applicable) +- [ ] I have updated architecture docs (if applicable) +- [ ] I have added JSDoc/comments for all new code + +### Dependencies +- [ ] I have updated the dependency list (if applicable) +- [ ] I have checked for and resolved any dependency conflicts +- [ ] I have verified compatibility with existing dependencies + +### Compatibility +- [ ] My changes are backward compatible +- [ ] I have tested with different LLM providers +- [ ] I have tested with different configurations +- [ ] I have verified Docker compatibility + +### Code Quality +- [ ] My code follows the project's style guidelines +- [ ] I have run linting tools and fixed any issues +- [ ] I have maintained or improved code coverage +- [ ] I have followed SOLID principles + +## Screenshots/Videos + +If applicable, add screenshots or videos to help explain your changes. + +## Additional Notes + +Add any other context about the PR here. + +## Linked Issues + +- Resolves #(issue number) +- Related to #(issue number) diff --git a/.github/workflows/code-quality.yml b/.github/workflows/code-quality.yml new file mode 100644 index 0000000..5fe93c6 --- /dev/null +++ b/.github/workflows/code-quality.yml @@ -0,0 +1,217 @@ +name: Code Quality + +on: + push: + branches: [ main ] + pull_request: + branches: [ main ] + +permissions: + contents: read + pull-requests: write + +jobs: + lint: + name: Lint + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version: '1.22' + + - name: Install golangci-lint + run: | + curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.55.2 + + - name: Go Lint + uses: golangci/golangci-lint-action@v4 + with: + version: latest + args: --timeout=5m + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + cache: 'npm' + cache-dependency-path: './web-app/package-lock.json' + + - name: Install frontend dependencies + run: npm ci + working-directory: ./web-app + + - name: Frontend Lint + run: npm run lint + working-directory: ./web-app + + type-check: + name: Type Check + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version: '1.22' + + - name: Go Type Check + run: go vet ./... + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + cache: 'npm' + cache-dependency-path: './web-app/package-lock.json' + + - name: Install frontend dependencies + run: npm ci + working-directory: ./web-app + + - name: TypeScript Check + run: npm run type-check + working-directory: ./web-app + + security: + name: Security Scan + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Run Gosec Security Scanner + uses: securego/gosec@master + with: + args: './...' + + - name: Run npm audit + run: npm audit + working-directory: ./web-app + + - name: Run Snyk to check for vulnerabilities + uses: snyk/actions/node@master + env: + SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }} + with: + args: --severity-threshold=high --all-projects + + coverage: + name: Code Coverage + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version: '1.22' + + - name: Install mupdf + run: sudo apt-get install -y mupdf + + - name: Set library path + run: echo "/usr/lib" | sudo tee -a /etc/ld.so.conf.d/mupdf.conf && sudo ldconfig + + - name: Run Go Coverage + run: | + go test -race -coverprofile=coverage.txt -covermode=atomic ./... + go tool cover -func=coverage.txt + + - name: Upload Go coverage to Codecov + uses: codecov/codecov-action@v4 + with: + file: ./coverage.txt + flags: backend + fail_ci_if_error: true + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + cache: 'npm' + cache-dependency-path: './web-app/package-lock.json' + + - name: Install frontend dependencies + run: npm ci + working-directory: ./web-app + + - name: Run Frontend Coverage + run: npm run test:coverage + working-directory: ./web-app + + - name: Upload Frontend coverage to Codecov + uses: codecov/codecov-action@v4 + with: + file: ./web-app/coverage/coverage-final.json + flags: frontend + fail_ci_if_error: true + + format: + name: Code Formatting + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version: '1.22' + + - name: Check Go Formatting + run: | + if [ -n "$(gofmt -l .)" ]; then + echo "Go files need formatting:" + gofmt -d . + exit 1 + fi + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + cache: 'npm' + cache-dependency-path: './web-app/package-lock.json' + + - name: Install frontend dependencies + run: npm ci + working-directory: ./web-app + + - name: Check Frontend Formatting + run: npm run format:check + working-directory: ./web-app + + complexity: + name: Code Complexity + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version: '1.22' + + - name: Install gocyclo + run: go install github.com/fzipp/gocyclo/cmd/gocyclo@latest + + - name: Check Go Code Complexity + run: | + gocyclo -over 15 . + + - name: Set up Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + cache: 'npm' + cache-dependency-path: './web-app/package-lock.json' + + - name: Install frontend dependencies + run: npm ci + working-directory: ./web-app + + - name: Check Frontend Complexity + run: npx ts-complexity ./src --max-complexity 15 + working-directory: ./web-app diff --git a/.github/workflows/documentation.yml b/.github/workflows/documentation.yml new file mode 100644 index 0000000..c9f3436 --- /dev/null +++ b/.github/workflows/documentation.yml @@ -0,0 +1,193 @@ +name: Documentation + +on: + push: + branches: [ main ] + paths: + - '**/*.md' + - 'docs/**' + - '.github/workflows/documentation.yml' + pull_request: + branches: [ main ] + paths: + - '**/*.md' + - 'docs/**' + - '.github/workflows/documentation.yml' + +permissions: + contents: read + pages: write + id-token: write + +jobs: + markdown-lint: + name: Markdown Lint + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Setup Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + + - name: Install markdownlint + run: npm install -g markdownlint-cli + + - name: Check Markdown files + run: markdownlint '**/*.md' --ignore node_modules + + - name: Check for broken links + uses: gaurav-nelson/github-action-markdown-link-check@v1 + with: + use-quiet-mode: 'yes' + use-verbose-mode: 'yes' + config-file: '.github/workflows/mlc_config.json' + folder-path: '.' + max-depth: -1 + + api-documentation: + name: API Documentation + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version: '1.22' + + - name: Install swag + run: go install github.com/swaggo/swag/cmd/swag@latest + + - name: Generate Swagger Documentation + run: swag init + + - name: Check if documentation changed + run: | + if [[ `git status --porcelain` ]]; then + echo "API documentation is out of date. Please run 'swag init' locally and commit the changes." + exit 1 + fi + + typescript-documentation: + name: TypeScript Documentation + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Setup Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + + - name: Install TypeDoc + run: npm install -g typedoc + + - name: Generate TypeScript Documentation + working-directory: ./web-app + run: typedoc --out docs/typescript src/ + + - name: Check documentation style + working-directory: ./web-app + run: | + if find src -name "*.tsx" -o -name "*.ts" | xargs grep -l "@todo\|FIXME"; then + echo "Found TODO or FIXME comments in the code. Please resolve them before merging." + exit 1 + fi + + spelling: + name: Documentation Spelling + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Check Spelling + uses: streetsidesoftware/cspell-action@v5 + with: + files: | + **/*.md + docs/**/* + + validate-examples: + name: Validate Code Examples + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Setup Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + + - name: Install dependencies + run: npm install markdown-code-block-runner + + - name: Validate code examples in documentation + run: npx markdown-code-block-runner "**/*.md" + + build-wiki: + name: Build Wiki + needs: [markdown-lint, spelling] + if: github.event_name == 'push' && github.ref == 'refs/heads/main' + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Setup mdBook + uses: peaceiris/actions-mdbook@v1 + with: + mdbook-version: 'latest' + + - name: Build documentation + run: | + mdbook build docs/ + + - name: Setup Pages + uses: actions/configure-pages@v4 + + - name: Upload artifact + uses: actions/upload-pages-artifact@v3 + with: + path: 'docs/book' + + - name: Deploy to GitHub Pages + id: deployment + uses: actions/deploy-pages@v4 + + check-docs-coverage: + name: Documentation Coverage + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version: '1.22' + + - name: Install doc coverage tool + run: go install github.com/client9/misspell/cmd/misspell@latest + + - name: Check public API documentation coverage + run: | + COVERAGE=$(go doc -all ./... | wc -l) + if [ "$COVERAGE" -lt 100 ]; then + echo "Documentation coverage is below threshold" + exit 1 + fi + + - name: Setup Node.js + uses: actions/setup-node@v4 + with: + node-version: '20' + + - name: Check TypeScript documentation coverage + working-directory: ./web-app + run: | + npm install -g typescript + COVERAGE=$(find src -name "*.ts" -o -name "*.tsx" | xargs grep -l "@doc" | wc -l) + if [ "$COVERAGE" -lt 50 ]; then + echo "TypeScript documentation coverage is below threshold" + exit 1 + fi diff --git a/.github/workflows/mlc_config.json b/.github/workflows/mlc_config.json new file mode 100644 index 0000000..8aa856f --- /dev/null +++ b/.github/workflows/mlc_config.json @@ -0,0 +1,29 @@ +{ + "replacementPatterns": [ + { + "pattern": "^/", + "replacement": "{{BASEURL}}/" + } + ], + "ignorePatterns": [ + { + "pattern": "^http://localhost" + }, + { + "pattern": "^#" + } + ], + "timeout": "20s", + "retryOn429": true, + "retryCount": 5, + "fallbackRetryDelay": "30s", + "aliveStatusCodes": [200, 206], + "httpHeaders": [ + { + "urls": ["https://github.com/"], + "headers": { + "Accept": "application/vnd.github.v3+json" + } + } + ] +} diff --git a/.markdownlint.json b/.markdownlint.json new file mode 100644 index 0000000..06dd391 --- /dev/null +++ b/.markdownlint.json @@ -0,0 +1,125 @@ +{ + "default": true, + "MD001": true, + "MD002": { + "level": 1 + }, + "MD003": { + "style": "atx" + }, + "MD004": { + "style": "dash" + }, + "MD005": true, + "MD006": true, + "MD007": { + "indent": 2 + }, + "MD009": { + "br_spaces": 2, + "list_item_empty_lines": false, + "strict": false + }, + "MD010": { + "code_blocks": false, + "spaces_per_tab": 2 + }, + "MD011": true, + "MD012": { + "maximum": 1 + }, + "MD013": { + "line_length": 120, + "code_blocks": false, + "tables": false, + "headings": false + }, + "MD014": false, + "MD018": true, + "MD019": true, + "MD020": true, + "MD021": true, + "MD022": true, + "MD023": true, + "MD024": { + "allow_different_nesting": true + }, + "MD025": { + "level": 1, + "front_matter_title": "" + }, + "MD026": { + "punctuation": ".,;:!。,;:!" + }, + "MD027": true, + "MD028": true, + "MD029": { + "style": "ordered" + }, + "MD030": { + "ul_single": 1, + "ol_single": 1, + "ul_multi": 1, + "ol_multi": 1 + }, + "MD031": true, + "MD032": true, + "MD033": { + "allowed_elements": [ + "br", + "details", + "summary", + "kbd", + "div", + "img", + "pre" + ] + }, + "MD034": true, + "MD035": { + "style": "---" + }, + "MD036": false, + "MD037": true, + "MD038": true, + "MD039": true, + "MD040": true, + "MD041": { + "level": 1, + "front_matter_title": "" + }, + "MD042": true, + "MD043": false, + "MD044": { + "names": [ + "JavaScript", + "TypeScript", + "React", + "Docker", + "Node.js", + "npm", + "Go", + "OpenAI", + "Ollama", + "paperless-gpt" + ], + "code_blocks": false + }, + "MD045": true, + "MD046": { + "style": "fenced" + }, + "MD047": true, + "MD048": { + "style": "backtick" + }, + "MD049": { + "style": "underscore" + }, + "MD050": { + "style": "asterisk" + }, + "MD051": true, + "MD052": true, + "MD053": true +} diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..59450ab --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,34 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [Unreleased] +### Added +- Enhanced project documentation and organization +- Project governance guidelines +- Security policy and guidelines +- Architecture documentation + +## [1.0.0] - Initial Release +### Added +- LLM-Enhanced OCR capabilities +- Automatic title & tag generation +- Automatic correspondent generation +- Custom prompt templates +- Docker deployment support +- Web UI for document management +- Support for multiple LLM providers (OpenAI, Ollama) +- Configurable environment variables +- Integration with paperless-ngx +- Manual and automatic processing modes +- Basic documentation and setup guides + +### Security +- API token authentication +- Environment-based configuration +- Docker container isolation + +For earlier history, please see the git commit log. diff --git a/GOVERNANCE.md b/GOVERNANCE.md new file mode 100644 index 0000000..fc258b6 --- /dev/null +++ b/GOVERNANCE.md @@ -0,0 +1,226 @@ +# Project Governance + +This document outlines the governance model for the paperless-gpt project. It describes how decisions are made and how community members can participate in project development. + +## Project Roles + +### Users +- People who use paperless-gpt +- Can submit bug reports and feature requests +- Can contribute to discussions +- Can help other users + +### Contributors +- Users who contribute to the project +- Submit pull requests +- Improve documentation +- Help with testing +- Participate in issue discussions + +### Maintainers +- Review and merge pull requests +- Manage issues and project boards +- Guide technical direction +- Ensure code quality +- Help onboard new contributors +- Responsibilities: + - Respond to issues and PRs + - Review code changes + - Maintain documentation + - Ensure tests pass + - Release new versions + - Uphold code of conduct + +### Project Lead +- Final decision maker for project direction +- Sets technical standards +- Manages maintainer team +- Oversees releases +- Current lead: [@icereed](https://github.com/icereed) + +## Decision Making + +### Technical Decisions +1. **Discussion Phase** + - Open an issue for discussion + - Gather community feedback + - Consider alternatives + - Document trade-offs + +2. **Implementation Phase** + - Create detailed proposal + - Submit pull request + - Address review feedback + - Update documentation + +3. **Review Process** + - At least one maintainer review + - Automated tests must pass + - Documentation must be updated + - Breaking changes require extra scrutiny + +### Project Direction +1. **Long-term Planning** + - Quarterly roadmap updates + - Community feedback periods + - Clear communication of goals + - Published milestones + +2. **Feature Acceptance** + - Must align with project goals + - Consider maintenance burden + - Evaluate user benefit + - Check implementation feasibility + +### Release Process +1. **Version Planning** + - Follow semantic versioning + - Document all changes + - Update dependencies + - Security review + +2. **Release Preparation** + - Create release branch + - Run test suite + - Update changelog + - Draft release notes + +3. **Release Publication** + - Tag version in repository + - Publish to registries + - Announce to community + - Monitor for issues + +## Communication + +### Channels +- GitHub Issues: Bug reports, feature requests +- GitHub Discussions: General discussion +- Pull Requests: Code changes +- Discord: Community chat +- Email: Security issues + +### Guidelines +- Be respectful and professional +- Stay on topic +- English is the working language +- Document decisions and rationale +- Keep security issues private + +## Contributing + +### Process +1. **Getting Started** + - Read contribution guidelines + - Set up development environment + - Understand code structure + - Pick starter issues + +2. **Making Changes** + - Create feature branch + - Follow code style + - Write tests + - Update docs + +3. **Submitting Changes** + - Create pull request + - Fill out template + - Respond to reviews + - Keep changes focused + +### Standards +- Follow code style guide +- Include tests +- Update documentation +- Sign commits +- One feature per PR + +## Code Review + +### Requirements +- At least one maintainer approval +- All tests passing +- Documentation updated +- Code style compliance +- No security issues + +### Process +1. **Automated Checks** + - Linting + - Tests + - Coverage + - Dependencies + +2. **Manual Review** + - Code quality + - Architecture + - Security + - Performance + +3. **Final Checks** + - Merge conflicts + - Documentation + - Breaking changes + - Version updates + +## Issue Management + +### Categories +- Bug: Software defects +- Feature: New functionality +- Enhancement: Improvements +- Documentation: Doc changes +- Question: User queries + +### Priority Levels +1. **Critical** + - Security issues + - Major bugs + - Blocking issues + +2. **High** + - Important features + - User experience issues + - Performance problems + +3. **Normal** + - Regular enhancements + - Minor bugs + - Documentation updates + +4. **Low** + - Nice-to-have features + - Style improvements + - Non-critical fixes + +## Project Changes + +### Governance Changes +- Open for community discussion +- Two week comment period +- Maintainer consensus required +- Project lead approval needed + +### Role Changes +- Based on consistent contributions +- Maintainer nomination +- Community feedback +- Project lead approval + +## Success Metrics + +### Project Health +- Issue resolution time +- PR merge time +- Test coverage +- Documentation quality +- Community engagement + +### Code Quality +- Automated metrics +- Review thoroughness +- Test coverage +- Documentation completeness +- Security standards + +This governance model is a living document and may be updated as the project evolves. Changes will be proposed and discussed with the community before implementation. diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 0000000..bcb5e31 --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,125 @@ +# Security Policy + +## Reporting a Vulnerability + +At paperless-gpt, we take security seriously. If you discover a security vulnerability, please follow these steps: + +1. **DO NOT** disclose the vulnerability publicly. +2. Send a detailed report to security@icereed.net including: + - A description of the vulnerability + - Steps to reproduce the issue + - Potential impact + - Any suggested fixes (if available) +3. Allow up to 48 hours for an initial response. +4. Please do not disclose the issue publicly until we've had a chance to address it. + +## Security Considerations + +### API Keys and Tokens +- Never commit API keys, tokens, or sensitive credentials to the repository +- Use environment variables for all sensitive configuration +- Rotate API keys and tokens regularly +- Use the minimum required permissions for API tokens + +### Data Privacy +- All document processing is done locally or via your configured LLM provider +- No document data is stored permanently outside your system +- Temporary files are cleaned up after processing +- Documents are transmitted securely using HTTPS + +### Docker Security +- Containers run with minimal privileges +- Images are regularly updated with security patches +- Dependencies are scanned for vulnerabilities +- Official base images are used + +### LLM Provider Security +- API calls to LLM providers use encrypted connections +- Rate limiting is implemented to prevent abuse +- Input validation is performed on all user inputs +- Error messages are sanitized to prevent information leakage + +### Access Control +- Use strong passwords for all services +- Implement the principle of least privilege +- Regular security audits of access controls +- Monitor for unauthorized access attempts + +## Version Support + +We provide security updates for: +- The current major version +- The previous major version for 6 months after a new major release + +## Best Practices for Deployment + +1. **Network Security** + - Use HTTPS for all connections + - Implement proper firewall rules + - Use secure DNS configurations + - Regular security audits + +2. **System Updates** + - Keep all system packages updated + - Subscribe to security advisories + - Regular vulnerability scanning + - Automated update notifications + +3. **Monitoring** + - Monitor system logs for suspicious activity + - Track resource usage patterns + - Alert on anomalous behavior + - Regular security assessments + +4. **Backup and Recovery** + - Regular backups of critical data + - Secure backup storage + - Tested recovery procedures + - Documented incident response plan + +## Dependencies + +We regularly monitor and update dependencies for security vulnerabilities: +- Automated dependency updates via Renovate +- Regular security audits of dependencies +- Minimal use of third-party packages +- Verification of package signatures + +## Contributing Security Fixes + +If you want to contribute security fixes: +1. Follow the standard pull request process +2. Mark security-related PRs as "security fix" +3. Provide detailed description of the security impact +4. Include tests that verify the fix + +## Security Release Process + +When a security issue is identified: +1. Issue is assessed and prioritized +2. Fix is developed and tested +3. Security advisory is prepared +4. Fix is deployed and announced +5. Users are notified through appropriate channels + +## Incident Response + +In case of a security incident: +1. Issue is immediately assessed +2. Affected systems are isolated +3. Root cause is identified +4. Fix is developed and tested +5. Systems are restored +6. Incident report is prepared +7. Preventive measures are implemented + +## Contact + +For security-related matters, contact: +- Email: security@icereed.net +- Response time: Within 48 hours +- Language: English + +## Acknowledgments + +We'd like to thank all security researchers who have helped improve the security of paperless-gpt. A list of acknowledged researchers can be found in our [Hall of Fame](CONTRIBUTORS.md#security-researchers). diff --git a/cline_docs/productContext.md b/cline_docs/productContext.md new file mode 100644 index 0000000..717fe8a --- /dev/null +++ b/cline_docs/productContext.md @@ -0,0 +1,78 @@ +# Product Context + +## Project Purpose +paperless-gpt is designed to enhance document management by integrating AI capabilities with paperless-ngx. Its primary purpose is to automate and improve the accuracy of document processing tasks that traditionally require manual intervention. + +## Problems Solved +1. Manual Document Organization + - Eliminates tedious manual tagging and titling + - Reduces time spent on document categorization + - Minimizes human error in classification + +2. OCR Quality Issues + - Improves text extraction from poor quality scans + - Enhances accuracy through LLM-based OCR + - Provides context-aware text interpretation + +3. Document Processing Automation + - Automates correspondent identification + - Streamlines document categorization + - Enables bulk processing capabilities + +## Core Functionality +1. AI-Powered Document Processing + - Title generation using LLMs + - Intelligent tag suggestions + - Automated correspondent detection + - Enhanced OCR capabilities + +2. Integration Features + - Seamless paperless-ngx integration + - Support for multiple LLM providers + - Docker-based deployment + - Customizable prompt templates + +3. User Experience + - Web-based interface + - Manual review capabilities + - Automatic processing options + - Flexible configuration options + +## Success Criteria +1. Accuracy Metrics + - High-quality OCR results + - Accurate document classification + - Relevant tag suggestions + - Correct correspondent identification + +2. Performance Goals + - Fast processing times + - Reliable system operation + - Scalable document handling + - Efficient resource usage + +3. User Satisfaction + - Intuitive interface + - Clear feedback mechanisms + - Minimal manual intervention + - Consistent results + +## Future Vision +1. Enhanced Capabilities + - Support for more AI providers + - Statistics and analytics features + - Advanced document analysis + - Improved processing algorithms + - Extended automation options + +2. Community Growth + - Active contributor base + - Regular feature additions + - Strong documentation + - Responsive maintenance + +3. Technical Evolution + - Improved architecture + - Enhanced performance + - Extended integrations + - Robust testing diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..91e9a12 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,221 @@ +# paperless-gpt Architecture + +This document provides a comprehensive overview of the paperless-gpt architecture, explaining how different components interact to provide AI-powered document processing capabilities. + +## System Overview + +paperless-gpt is designed as a companion service to paperless-ngx, adding AI capabilities for document processing. The system consists of several key components: + +```mermaid +graph TB + UI[Web UI] --> API[Backend API] + API --> LLM[LLM Service] + API --> OCR[OCR Service] + API --> DB[Local DB] + API --> PaperlessNGX[paperless-ngx API] + LLM --> OpenAI[OpenAI] + LLM --> Ollama[Ollama] + OCR --> VisionLLM[Vision LLM] +``` + +## Core Components + +### 1. Backend API (Go) +- Handles all business logic +- Manages document processing workflow +- Coordinates between services +- Provides REST API endpoints +- Manages state and caching + +### 2. Web UI (React + TypeScript) +- User interface for document management +- Real-time processing status +- Document preview and editing +- Configuration interface +- Responsive design + +### 3. LLM Service +- Manages LLM provider connections +- Handles prompt engineering +- Processes document content +- Generates metadata suggestions +- Supports multiple providers: + - OpenAI (gpt-4, gpt-3.5-turbo) + - Ollama (llama2, etc.) + +### 4. OCR Service +- Vision LLM integration +- Image preprocessing +- Text extraction +- Layout analysis +- Quality enhancement + +### 5. Local Database +- Caches processing results +- Stores configuration +- Manages queues +- Tracks document state + +## Data Flow + +### Document Processing Flow +```mermaid +sequenceDiagram + participant U as User + participant UI as Web UI + participant API as Backend API + participant LLM as LLM Service + participant OCR as OCR Service + participant PNX as paperless-ngx + + U->>UI: Upload Document + UI->>API: Process Request + API->>OCR: Extract Text + OCR-->>API: Text Content + API->>LLM: Generate Metadata + LLM-->>API: Suggestions + API->>UI: Preview Results + U->>UI: Approve Changes + UI->>API: Apply Changes + API->>PNX: Update Document + PNX-->>API: Confirmation + API-->>UI: Success +``` + +## Key Design Decisions + +### 1. Modular Architecture +- Separation of concerns +- Pluggable components +- Easy to extend +- Maintainable code + +### 2. Stateless Design +- Scalable architecture +- No shared state +- Resilient operation +- Easy deployment + +### 3. Security First +- API authentication +- Data encryption +- Input validation +- Error handling + +### 4. Performance Optimization +- Local caching +- Batch processing +- Async operations +- Resource management + +## Directory Structure + +``` +paperless-gpt/ +├── main.go # Application entry point +├── app_llm.go # LLM service implementation +├── app_http_handlers.go # HTTP handlers +├── paperless.go # paperless-ngx integration +├── ocr.go # OCR service +├── types.go # Type definitions +├── web-app/ # Frontend application +│ ├── src/ +│ │ ├── components/ # React components +│ │ ├── App.tsx # Main application +│ │ └── ... +│ └── ... +└── ... +``` + +## Configuration Management + +The system uses environment variables for configuration, allowing easy deployment and configuration changes: + +``` +PAPERLESS_BASE_URL # paperless-ngx connection +LLM_PROVIDER # AI backend selection +VISION_LLM_PROVIDER # OCR provider selection +... +``` + +## Error Handling + +The system implements comprehensive error handling: + +1. **User Errors** + - Input validation + - Clear error messages + - Guided resolution + +2. **System Errors** + - Graceful degradation + - Automatic retry + - Error logging + - Monitoring alerts + +3. **External Service Errors** + - Fallback options + - Circuit breaking + - Rate limiting + - Error reporting + +## Scaling Considerations + +The architecture supports scaling through: + +1. **Horizontal Scaling** + - Stateless design + - Load balancing + - Distributed processing + +2. **Resource Management** + - Connection pooling + - Cache management + - Queue processing + - Rate limiting + +3. **Performance Optimization** + - Batch processing + - Async operations + - Efficient algorithms + - Resource caching + +## Future Considerations + +The architecture is designed to support future enhancements: + +1. **Plugin System** + - Custom processors + - Integration points + - Event hooks + +2. **Advanced Features** + - Multi-language support + - Custom ML models + - Advanced analytics + +3. **Integration Options** + - API extensions + - Service hooks + - Custom providers + +## Development Guidelines + +When making changes to the architecture: + +1. **Documentation** + - Update this document + - Add inline comments + - Update API docs + +2. **Testing** + - Unit tests + - Integration tests + - Performance tests + +3. **Review Process** + - Architecture review + - Security review + - Performance review + +This architecture documentation is maintained by the core team and updated as the system evolves.