``` ├── .dockerignore ├── .github/ ├── FUNDING.yml ├── ISSUE_TEMPLATE/ ├── blank.yaml ├── config.yml ├── 功能请求_cn.yaml ├── 功能请求_en.yaml ├── 问题反馈_cn.yaml ├── 问题反馈_en.yaml ├── dependabot.yml ├── release-drafter.yml ├── workflows/ ├── black.format.yml ├── exe-build.yml ├── fork-build.yml ├── fork-test.yml ├── python-publish.yml ├── python-test.yml ├── .gitignore ├── .pre-commit-config.yaml ├── Dockerfile ├── LICENSE ├── README.md ├── app.json ├── docs/ ├── ADVANCED.md ├── APIS.md ├── CODE_OF_CONDUCT.md ├── README_GUI.md ├── README_ja-JP.md ├── README_ko-KR.md ├── README_zh-CN.md ├── README_zh-TW.md ├── images/ ├── after.png ├── banner.png ├── before.png ├── cmd.explained.png ├── cmd.explained.zh.png ├── gui.gif ├── preview.gif ├── pdf2zh/ ├── __init__.py ├── backend.py ├── cache.py ├── config.py ├── converter.py ├── doclayout.py ├── gui.py ``` ## /.dockerignore ```dockerignore path="/.dockerignore" .github docs .git .pre-commit-config.yaml uv.lock pdf2zh_files gui/pdf2zh_files gradio_files tmp gui/gradio_files gui/tmp # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ cover/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder .pybuilder/ target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv # For a library or package, you might want to ignore these files since the code is # intended to run in multiple environments; otherwise, check them in: # .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # poetry # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. # This is especially recommended for binary packages to ensure reproducibility, and is more # commonly ignored for libraries. # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control #poetry.lock # pdm # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. #pdm.lock # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it # in version control. # https://pdm.fming.dev/latest/usage/project/#working-with-version-control .pdm.toml .pdm-python .pdm-build/ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ # pytype static type analyzer .pytype/ # Cython debug symbols cython_debug/ # PyCharm # JetBrains specific template is maintained in a separate JetBrains.gitignore that can # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. .idea/ .vscode .DS_Store ``` ## /.github/FUNDING.yml ```yml path="/.github/FUNDING.yml" # These are supported funding model platforms github: [Byaidu, reycn, Wybxc, hellofinch] # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] patreon: # Replace with a single Patreon username open_collective: # Replace with a single Open Collective username ko_fi: # Replace with a single Ko-fi username tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry liberapay: # Replace with a single Liberapay username issuehunt: # Replace with a single IssueHunt username lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry polar: # Replace with a single Polar username buy_me_a_coffee: # Replace with a single Buy Me a Coffee username thanks_dev: # Replace with a single thanks.dev username custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2'] ``` ## /.github/ISSUE_TEMPLATE/blank.yaml ```yaml path="/.github/ISSUE_TEMPLATE/blank.yaml" name: Blank Issue description: Create a blank issue for discussion body: - type: checkboxes id: checks attributes: label: before ... options: - label: This issue is not about question or bug. required: true - type: textarea id: describe attributes: label: Add a description ``` ## /.github/ISSUE_TEMPLATE/config.yml ```yml path="/.github/ISSUE_TEMPLATE/config.yml" blank_issues_enabled: false ``` ## /.github/ISSUE_TEMPLATE/功能请求_cn.yaml ```yaml path="/.github/ISSUE_TEMPLATE/功能请求_cn.yaml" name: 功能请求 description: 使用中文进行功能请求 labels: ['enhancement'] body: - type: textarea id: describe attributes: label: 在什么场景下,需要你请求的功能? description: 简要描述相关的使用场景 validations: required: false - type: textarea id: solution attributes: label: 解决方案 description: 描述你想要的解决方案 validations: required: false - type: textarea id: additional attributes: label: 其他内容 description: 关于该功能请求的任何其他项目。 validations: required: false ``` ## /.github/ISSUE_TEMPLATE/功能请求_en.yaml ```yaml path="/.github/ISSUE_TEMPLATE/功能请求_en.yaml" name: Feature request description: Suggest an idea for this project labels: ['enhancement'] body: - type: textarea id: describe attributes: label: Is your feature request related to a problem? description: A clear and concise description of what the problem is placeholder: Ex. I'm always frustrated when ... validations: required: false - type: textarea id: solution attributes: label: Describe the solution you'd like description: A clear and concise description of what you want to happen validations: required: false - type: textarea id: additional attributes: label: Additional context description: Add any other projects about the feature request here. validations: required: false ``` ## /.github/ISSUE_TEMPLATE/问题反馈_cn.yaml ```yaml path="/.github/ISSUE_TEMPLATE/问题反馈_cn.yaml" name: 上报 Bug description: 使用中文进行 Bug 报告 labels: ['bug'] body: - type: checkboxes id: checks attributes: label: 在提问之前... options: - label: 我已经搜索了现有的 issues required: true - label: 我在提问题之前至少花费了 5 分钟来思考和准备 required: true - label: 我已经认真且完整的阅读了 wiki required: true - label: 我已经认真检查了问题和网络环境无关(包括但不限于Google不可用,模型下载失败) required: true - type: markdown attributes: value: | 感谢您使用本项目并反馈! 请再次确认上述复选框所述的内容已经认真执行! - type: textarea id: environment attributes: label: 使用的环境 description: | examples: - **OS**: Ubuntu 24.10 - **Python**: 3.12.0 - **pdf2zh**: 1.9.0 value: | - OS: - Python: - pdf2zh: render: markdown validations: required: false - type: dropdown id: install attributes: label: 请选择安装方式 options: - pip - exe - 源码 - docker validations: required: true - type: textarea id: describe attributes: label: 描述你的问题 description: 简要描述你的问题 validations: required: true - type: textarea id: reproduce attributes: label: 如何复现 description: 重现该行为的步骤 value: | 1. 执行 '...' 2. 选择 '....' 3. 出现问题 validations: required: false - type: textarea id: expected attributes: label: 预期行为 description: 简要描述你期望得到的反馈 validations: required: false - type: textarea id: logs attributes: label: 相关 Logs description: 请复制并粘贴任何相关的日志输出。 render: Text validations: required: false - type: textarea id: PDFfile attributes: label: 原始PDF文件 description: | 如果涉及到排版错误的问题,请一定提供原始的PDF文件,方便复现错误。 validations: required: false - type: textarea id: others attributes: label: 还有别的吗? description: | 相关的配置?链接?参考资料? 任何能让我们对你所遇到的问题有更多了解的东西。 validations: required: false ``` ## /.github/ISSUE_TEMPLATE/问题反馈_en.yaml ```yaml path="/.github/ISSUE_TEMPLATE/问题反馈_en.yaml" name: Bug Report description: Create a report to help us improve labels: ['bug'] body: - type: checkboxes id: checks attributes: label: Before you asking options: - label: I have searched the existing issues required: true - label: I spend at least 5 minutes for thinking and preparing required: true - label: I have thoroughly and completely read the wiki. required: true - label: I have carefully checked the issue, and it is unrelated to the network environment. required: true - type: markdown attributes: value: | Thank you for using this project and providing feedback! - type: textarea id: environment attributes: label: Environment description: | examples: - **OS**: Ubuntu 24.10 - **Python**: 3.12.0 - **pdf2zh**: 1.9.0 value: | - OS: - Python: - pdf2zh: render: markdown validations: required: false - type: dropdown id: install attributes: label: How to install pdf2zh options: - pip - exe - source - docker validations: required: true - type: textarea id: describe attributes: label: Describe the bug description: A clear and concise description of what the bug is. validations: required: true - type: textarea id: reproduce attributes: label: To Reproduce description: Steps to reproduce the behavior value: | 1. execute '...' 2. select '....' 3. see errors validations: required: false - type: textarea id: expected attributes: label: Expected behavior description: A clear and concise description of what you expected to happen. validations: required: false - type: textarea id: logs attributes: label: Relevant log output description: Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks. render: Text validations: required: false - type: textarea id: PDFfile attributes: label: Origin PDF file description: | If the issue involves formatting errors, please provide the original PDF file to facilitate reproduction of the error. validations: required: false - type: textarea id: others attributes: label: Anything else? description: | Related configs? Links? References? Anything that will give us more context about the issue you are encountering! validations: required: false ``` ## /.github/dependabot.yml ```yml path="/.github/dependabot.yml" version: 2 updates: - package-ecosystem: github-actions directory: "/" schedule: interval: weekly # - package-ecosystem: pip # directory: "/.github/workflows" # schedule: # interval: weekly # - package-ecosystem: pip # directory: "/docs" # schedule: # interval: weekly - package-ecosystem: pip directory: "/" schedule: interval: weekly versioning-strategy: lockfile-only allow: - dependency-type: "all" ``` ## /.github/release-drafter.yml ```yml path="/.github/release-drafter.yml" name-template: 'v$RESOLVED_VERSION' tag-template: 'v$RESOLVED_VERSION' categories: - title: '🚀 Features' labels: - 'feature' - 'enhancement' - title: '🐛 Bug Fixes' labels: - 'fix' - 'bugfix' - 'bug' - title: '🧰 Maintenance' labels: - 'chore' - 'maintenance' - 'refactor' - title: '📝 Documentation' labels: - 'docs' - 'documentation' change-template: '- $TITLE @$AUTHOR (#$NUMBER)' change-title-escapes: '\<*_&' # You can add # and @ to disable mentions version-resolver: major: labels: - 'major' minor: labels: - 'minor' patch: labels: - 'patch' default: patch template: | ## Changes $CHANGES ## Contributors $CONTRIBUTORS ## Windows Specific If you cannot open it after downloading, please install https://aka.ms/vs/17/release/vc_redist.x64.exe and try again. ## Assets - pdf2zh-v$RESOLVED_VERSION-win64.zip: pdf2zh **without** assets(font, model, etc.) - pdf2zh-v$RESOLVED_VERSION-with-assets-win64.zip: (**Recommended**) pdf2zh **with** assets(font, model, etc.) > [!NOTE] > > The version without assets will also dynamically download resources when running, but the download may fail due to network issues. ``` ## /.github/workflows/black.format.yml ```yml path="/.github/workflows/black.format.yml" name: Format Code with Black on: [push, pull_request] jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: psf/black@stable ``` ## /.github/workflows/exe-build.yml ```yml path="/.github/workflows/exe-build.yml" name: windows exe Release Workflow on: workflow_dispatch: inputs: release_version: description: 'Release Version (e.g., v1.0.0)' required: true type: string # push: # debug purpose env: WIN_EXE_PYTHON_VERSION: 3.12.9 jobs: build-win64-exe: runs-on: windows-latest steps: - name: checkout babeldoc metadata uses: actions/checkout@v4 with: repository: funstory-ai/BabelDOC path: babeldoctemp1234567 token: ${{ secrets.GITHUB_TOKEN }} sparse-checkout: babeldoc/assets/embedding_assets_metadata.py - name: Cached Assets id: cache-assets uses: actions/cache@v4.2.2 with: path: ~/.cache/babeldoc key: test-1-babeldoc-assets-${{ hashFiles('babeldoctemp1234567/babeldoc/assets/embedding_assets_metadata.py') }} - name: 检出代码 uses: actions/checkout@v4 - name: Setup uv with Python ${{ env.WIN_EXE_PYTHON_VERSION }} uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231 # v5.3.1 with: python-version: ${{ env.WIN_EXE_PYTHON_VERSION }} enable-cache: true cache-dependency-glob: "pyproject.toml" - name: 执行所有任务(创建目录、下载、解压、复制文件、安装依赖) shell: pwsh run: | Write-Host "==== 删除 babeldoctemp1234567 文件夹 ====" if (Test-Path "./babeldoctemp1234567") { Remove-Item -Path "./babeldoctemp1234567" -Recurse -Force Write-Host "babeldoctemp1234567 文件夹已成功删除" } else { Write-Host "babeldoctemp1234567 文件夹不存在,无需删除" } Write-Host "==== 创建必要的目录 ====" New-Item -Path "./build" -ItemType Directory -Force New-Item -Path "./build/runtime" -ItemType Directory -Force New-Item -Path "./dep_build" -ItemType Directory -Force Write-Host "==== 复制代码到 dep_build ====" Get-ChildItem -Path "./" -Exclude "dep_build", "build" | Copy-Item -Destination "./dep_build" -Recurse -Force Write-Host "==== 下载并解压 Python ${{ env.WIN_EXE_PYTHON_VERSION }} ====" Write-Host "pythonUrl: https://www.python.org/ftp/python/${{ env.WIN_EXE_PYTHON_VERSION }}/python-${{ env.WIN_EXE_PYTHON_VERSION }}-embed-amd64.zip" $pythonUrl = "https://www.python.org/ftp/python/${{ env.WIN_EXE_PYTHON_VERSION }}/python-${{ env.WIN_EXE_PYTHON_VERSION }}-embed-amd64.zip" $pythonZip = "./dep_build/python.zip" Invoke-WebRequest -Uri $pythonUrl -OutFile $pythonZip Expand-Archive -Path $pythonZip -DestinationPath "./build/runtime" -Force Write-Host "==== 下载并解压 PyStand ====" $pystandUrl = "https://github.com/skywind3000/PyStand/releases/download/1.1.4/PyStand-v1.1.4-exe.zip" $pystandZip = "./dep_build/PyStand.zip" Invoke-WebRequest -Uri $pystandUrl -OutFile $pystandZip Expand-Archive -Path $pystandZip -DestinationPath "./dep_build/PyStand" -Force Write-Host "==== 复制 PyStand.exe 到 build 并重命名 ====" $pystandExe = "./dep_build/PyStand/PyStand-x64-CLI/PyStand.exe" $destExe = "./build/pdf2zh.exe" if (Test-Path $pystandExe) { Copy-Item -Path $pystandExe -Destination $destExe -Force } else { Write-Host "错误: PyStand.exe 未找到!" exit 1 } Write-Host "==== 创建 Python venv 在 dep_build ====" uv venv ./dep_build/venv ./dep_build/venv/Scripts/activate Write-Host "==== 在 venv 环境中安装项目依赖 ====" uv pip install . Write-Host "==== 复制 venv/Lib/site-packages 到 build/ ====" Copy-Item -Path "./dep_build/venv/Lib/site-packages" -Destination "./build/site-packages" -Recurse -Force Write-Host "==== 复制 script/_pystand_static.int 到 build/ ====" $staticFile = "./script/_pystand_static.int" $destStatic = "./build/_pystand_static.int" if (Test-Path $staticFile) { Copy-Item -Path $staticFile -Destination $destStatic -Force } else { Write-Host "错误: script/_pystand_static.int 未找到!" exit 1 } uv run --active babeldoc --generate-offline-assets ./build - name: Upload build artifact uses: actions/upload-artifact@v4 with: name: win64-exe path: ./build if-no-files-found: error compression-level: 9 include-hidden-files: true test-win64-exe: needs: - build-win64-exe runs-on: windows-latest steps: - name: 检出代码 uses: actions/checkout@v4 - name: Download build artifact uses: actions/download-artifact@v4 with: name: win64-exe path: ./build - name: Test show version run: | ./build/pdf2zh.exe --version - name: Test - Translate a PDF file with plain text only run: | ./build/pdf2zh.exe ./test/file/translate.cli.plain.text.pdf -o ./test/file - name: Test - Translate a PDF file figure run: | ./build/pdf2zh.exe ./test/file/translate.cli.text.with.figure.pdf -o ./test/file - name: Delete offline assets and cache shell: pwsh run: | Write-Host "==== 查找并删除离线资源包 ====" $offlineAssetsPath = Get-ChildItem -Path "./build" -Filter "offline_assets_*.zip" -Recurse | Select-Object -First 1 -ExpandProperty FullName if ($offlineAssetsPath) { Write-Host "找到离线资源包: $offlineAssetsPath" Remove-Item -Path $offlineAssetsPath -Force Write-Host "已删除离线资源包" } else { Write-Host "未找到离线资源包" } Write-Host "==== 删除缓存目录 ====" $cachePath = "$env:USERPROFILE/.cache/babeldoc" if (Test-Path $cachePath) { Remove-Item -Path $cachePath -Recurse -Force Write-Host "已删除缓存目录: $cachePath" } else { Write-Host "缓存目录不存在: $cachePath" } - name: Test - Translate without offline assets run: | ./build/pdf2zh.exe ./test/file/translate.cli.plain.text.pdf -o ./test/file - name: Upload test results uses: actions/upload-artifact@v4 with: name: test-results path: ./test/file/ ``` ## /.github/workflows/fork-build.yml ```yml path="/.github/workflows/fork-build.yml" name: fork-build on: workflow_dispatch: # debug purpose # push: env: REGISTRY: ghcr.io REPO_LOWER: ${{ github.repository_owner }}/${{ github.event.repository.name }} GHCR_REPO: ghcr.io/${{ github.repository }} WIN_EXE_PYTHON_VERSION: 3.12.9 jobs: check-repository: name: Check if running in main repository runs-on: ubuntu-latest outputs: is_main_repo: ${{ github.repository == 'Byaidu/PDFMathTranslate' }} steps: - run: echo "Running repository check" test: uses: ./.github/workflows/python-test.yml needs: check-repository if: needs.check-repository.outputs.is_main_repo != 'true' build: strategy: fail-fast: false matrix: include: - platform: linux/amd64 runner: ubuntu-latest - platform: linux/arm64 runner: ubuntu-24.04-arm runs-on: ${{ matrix.runner }} needs: - check-repository - test if: needs.check-repository.outputs.is_main_repo != 'true' permissions: contents: read packages: write steps: - name: Convert to lowercase run: | echo "GHCR_REPO_LOWER=$(echo ${{ env.GHCR_REPO }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_ENV - name: Prepare run: | platform=${{ matrix.platform }} echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV - name: Checkout repository uses: actions/checkout@v4 - name: Docker meta id: meta uses: docker/metadata-action@v5 with: images: | ${{ env.GHCR_REPO_LOWER }} - name: Login to GHCR uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.repository_owner }} password: ${{ secrets.GITHUB_TOKEN }} - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Build and push by digest id: build uses: docker/build-push-action@v6 with: platforms: ${{ matrix.platform }} labels: ${{ steps.meta.outputs.labels }} outputs: type=image,name=${{ env.GHCR_REPO_LOWER }},push-by-digest=true,name-canonical=true,push=true cache-from: ${{ matrix.platform == 'linux/amd64' && 'type=gha' || '' }} cache-to: ${{ matrix.platform == 'linux/amd64' && 'type=gha,mode=max' || '' }} - name: Export digest run: | mkdir -p ${{ runner.temp }}/digests digest="${{ steps.build.outputs.digest }}" touch "${{ runner.temp }}/digests/${digest#sha256:}" - name: Upload digest uses: actions/upload-artifact@v4 with: name: digests-${{ env.PLATFORM_PAIR }} path: ${{ runner.temp }}/digests/* if-no-files-found: error retention-days: 1 merge: runs-on: ubuntu-latest needs: - check-repository - test - build if: needs.check-repository.outputs.is_main_repo != 'true' permissions: contents: read packages: write steps: - name: Convert to lowercase run: | echo "GHCR_REPO_LOWER=$(echo ${{ env.GHCR_REPO }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_ENV - name: Download digests uses: actions/download-artifact@v4 with: path: ${{ runner.temp }}/digests pattern: digests-* merge-multiple: true - name: Login to GHCR uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.repository_owner }} password: ${{ secrets.GITHUB_TOKEN }} - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Docker meta id: meta uses: docker/metadata-action@v5 with: images: | ${{ env.GHCR_REPO_LOWER }} tags: | type=raw,value=dev type=semver,pattern={{version}} type=semver,pattern={{major}}.{{minor}} - name: Create manifest list and push working-directory: ${{ runner.temp }}/digests run: | docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \ $(printf '${{ env.GHCR_REPO_LOWER }}@sha256:%s ' *) - name: Inspect image run: | docker buildx imagetools inspect ${{ env.GHCR_REPO_LOWER }}:${{ steps.meta.outputs.version }} build-win64-exe: runs-on: windows-latest needs: - check-repository if: needs.check-repository.outputs.is_main_repo != 'true' steps: - name: 检出代码 uses: actions/checkout@v4 - name: Setup uv with Python ${{ env.WIN_EXE_PYTHON_VERSION }} uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231 # v5.3.1 with: python-version: ${{ env.WIN_EXE_PYTHON_VERSION }} enable-cache: true cache-dependency-glob: "pyproject.toml" - name: 执行所有任务(创建目录、下载、解压、复制文件、安装依赖) shell: pwsh run: | Write-Host "==== 创建必要的目录 ====" New-Item -Path "./build" -ItemType Directory -Force New-Item -Path "./build/runtime" -ItemType Directory -Force New-Item -Path "./dep_build" -ItemType Directory -Force Write-Host "==== 复制代码到 dep_build ====" Get-ChildItem -Path "./" -Exclude "dep_build", "build" | Copy-Item -Destination "./dep_build" -Recurse -Force Write-Host "==== 下载并解压 Python ${{ env.WIN_EXE_PYTHON_VERSION }} ====" Write-Host "pythonUrl: https://www.python.org/ftp/python/${{ env.WIN_EXE_PYTHON_VERSION }}/python-${{ env.WIN_EXE_PYTHON_VERSION }}-embed-amd64.zip" $pythonUrl = "https://www.python.org/ftp/python/${{ env.WIN_EXE_PYTHON_VERSION }}/python-${{ env.WIN_EXE_PYTHON_VERSION }}-embed-amd64.zip" $pythonZip = "./dep_build/python.zip" Invoke-WebRequest -Uri $pythonUrl -OutFile $pythonZip Expand-Archive -Path $pythonZip -DestinationPath "./build/runtime" -Force Write-Host "==== 下载 Visual C++ Redistributable 安装程序 ====" $vcRedistUrl = "https://aka.ms/vs/17/release/vc_redist.x64.exe" $vcRedistPath = "./build/无法运行请安装vc_redist.x64.exe" Invoke-WebRequest -Uri $vcRedistUrl -OutFile $vcRedistPath Write-Host "已下载 Visual C++ Redistributable 安装程序到: $vcRedistPath" Write-Host "==== 下载并解压 PyStand ====" $pystandUrl = "https://github.com/skywind3000/PyStand/releases/download/1.1.4/PyStand-v1.1.4-exe.zip" $pystandZip = "./dep_build/PyStand.zip" Invoke-WebRequest -Uri $pystandUrl -OutFile $pystandZip Expand-Archive -Path $pystandZip -DestinationPath "./dep_build/PyStand" -Force Write-Host "==== 复制 PyStand.exe 到 build 并重命名 ====" $pystandExe = "./dep_build/PyStand/PyStand-x64-CLI/PyStand.exe" $destExe = "./build/pdf2zh.exe" if (Test-Path $pystandExe) { Copy-Item -Path $pystandExe -Destination $destExe -Force } else { Write-Host "错误: PyStand.exe 未找到!" exit 1 } Write-Host "==== 创建 Python venv 在 dep_build ====" uv venv ./dep_build/venv ./dep_build/venv/Scripts/activate Write-Host "==== 在 venv 环境中安装项目依赖 ====" uv pip install . Write-Host "==== 复制 venv/Lib/site-packages 到 build/ ====" Copy-Item -Path "./dep_build/venv/Lib/site-packages" -Destination "./build/site-packages" -Recurse -Force Write-Host "==== 复制 script/_pystand_static.int 到 build/ ====" $staticFile = "./script/_pystand_static.int" $destStatic = "./build/_pystand_static.int" if (Test-Path $staticFile) { Copy-Item -Path $staticFile -Destination $destStatic -Force } else { Write-Host "错误: script/_pystand_static.int 未找到!" exit 1 } - name: Upload build artifact uses: actions/upload-artifact@v4 with: name: win64-exe path: ./build if-no-files-found: error compression-level: 1 include-hidden-files: true test-win64-exe: needs: - build-win64-exe - check-repository if: needs.check-repository.outputs.is_main_repo != 'true' runs-on: windows-latest steps: - name: 检出代码 uses: actions/checkout@v4 - name: Download build artifact uses: actions/download-artifact@v4 with: name: win64-exe path: ./build - name: Test show version (online mode) run: | ./build/pdf2zh.exe --version - name: Test - Translate a PDF file with plain text only (online mode) run: | ./build/pdf2zh.exe ./test/file/translate.cli.plain.text.pdf -o ./test/file - name: Test - Translate a PDF file figure run: | ./build/pdf2zh.exe ./test/file/translate.cli.text.with.figure.pdf -o ./test/file - name: Test - Translate without offline assets (online mode) run: | ./build/pdf2zh.exe ./test/file/translate.cli.plain.text.pdf -o ./test/file - name: Upload test results uses: actions/upload-artifact@v4 with: name: test-results path: ./test/file/ if-no-files-found: error - name: Setup uv with Python ${{ env.WIN_EXE_PYTHON_VERSION }} uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231 # v5.3.1 with: python-version: ${{ env.WIN_EXE_PYTHON_VERSION }} enable-cache: true cache-dependency-glob: "pyproject.toml" - name: Generate offline assets shell: pwsh run: | Write-Host "==== 生成离线资源包 ====" uv run --active babeldoc --generate-offline-assets ./build - name: Delete cache shell: pwsh run: | Write-Host "==== 删除缓存目录 ====" $cachePath = "$env:USERPROFILE/.cache/babeldoc" if (Test-Path $cachePath) { Remove-Item -Path $cachePath -Recurse -Force Write-Host "已删除缓存目录: $cachePath" } else { Write-Host "缓存目录不存在: $cachePath" } - name: Test - Translate with offline assets (offline mode) run: | Write-Host "==== 测试离线资源包 ====" New-Item -Path "./test/file/offline_result" -ItemType Directory -Force ./build/pdf2zh.exe ./test/file/translate.cli.plain.text.pdf -o ./test/file/offline_result - name: Upload offline test results uses: actions/upload-artifact@v4 with: name: offline-test-results path: ./test/file/offline_result/ if-no-files-found: error - name: Upload build with offline assets artifact uses: actions/upload-artifact@v4 with: name: win64-exe-with-assets path: ./build if-no-files-found: error compression-level: 1 include-hidden-files: true ``` ## /.github/workflows/fork-test.yml ```yml path="/.github/workflows/fork-test.yml" name: fork-test on: push: branches: [ "main", "master" ] env: REGISTRY: ghcr.io REPO_LOWER: ${{ github.repository_owner }}/${{ github.event.repository.name }} GHCR_REPO: ghcr.io/${{ github.repository }} WIN_EXE_PYTHON_VERSION: 3.12.9 jobs: check-repository: name: Check if running in main repository runs-on: ubuntu-latest outputs: is_main_repo: ${{ github.repository == 'Byaidu/PDFMathTranslate' }} steps: - run: echo "Running repository check" test: uses: ./.github/workflows/python-test.yml needs: check-repository if: needs.check-repository.outputs.is_main_repo != 'true' ``` ## /.github/workflows/python-publish.yml ```yml path="/.github/workflows/python-publish.yml" name: Test and Release on: push: branches: - main - master permissions: id-token: write contents: write pull-requests: write env: REGISTRY: ghcr.io REPO_LOWER: ${{ github.repository_owner }}/${{ github.event.repository.name }} GHCR_REPO: ghcr.io/${{ github.repository }} DOCKERHUB_REPO: byaidu/pdf2zh WIN_EXE_PYTHON_VERSION: "3.12.9" jobs: check-repository: name: Check if running in main repository runs-on: ubuntu-latest outputs: # debug purpose is_main_repo: ${{ github.repository == 'Byaidu/PDFMathTranslate' }} steps: - run: echo "Running repository check" test: needs: check-repository uses: ./.github/workflows/python-test.yml if: needs.check-repository.outputs.is_main_repo == 'true' build: name: Build distribution 📦 needs: [test, check-repository] if: needs.check-repository.outputs.is_main_repo == 'true' runs-on: ubuntu-latest outputs: is_release: ${{ steps.check-version.outputs.tag }} version: ${{ steps.check-version.outputs.tag && steps.get-release-version.outputs.version || steps.get-dev-version.outputs.version }} steps: - uses: actions/checkout@v4 with: persist-credentials: true fetch-depth: 2 token: ${{ secrets.GITHUB_TOKEN }} - name: Setup uv with Python 3.12 uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231 # v5.3.1 with: python-version: "3.12" enable-cache: true cache-dependency-glob: "pyproject.toml" - name: Check if there is a parent commit id: check-parent-commit run: | echo "sha=$(git rev-parse --verify --quiet HEAD^)" >> $GITHUB_OUTPUT - name: Detect and tag new version id: check-version if: steps.check-parent-commit.outputs.sha uses: salsify/action-detect-and-tag-new-version@b1778166f13188a9d478e2d1198f993011ba9864 # v2.0.3 with: version-command: | cat pyproject.toml | grep "version = " | head -n 1 | awk -F'"' '{print $2}' tag-template: 'v{VERSION}' - name: Install Dependencies run: | uv sync - name: Bump version for developmental release if: "!steps.check-version.outputs.tag" id: get-dev-version run: | version=$(bumpver update --patch --tag=final --dry 2>&1 | grep "New Version" | awk '{print $NF}') echo "version=$version.dev$(date +%s)" >> $GITHUB_OUTPUT bumpver update --set-version $version.dev$(date +%s) - name: Get release version if: steps.check-version.outputs.tag id: get-release-version run: | version=$(cat pyproject.toml | grep "version = " | head -n 1 | awk -F'"' '{print $2}') echo "version=$version" >> $GITHUB_OUTPUT - name: Build package run: "uv build" - name: Store the distribution packages uses: actions/upload-artifact@v4.6.0 with: name: python-package-distributions path: dist/ publish-to-pypi: name: Publish Python 🐍 distribution 📦 to PyPI if: needs.build.outputs.is_release != '' needs: - check-repository - build - test-win64-exe runs-on: ubuntu-latest environment: name: pypi url: https://pypi.org/p/pdf2zh permissions: id-token: write steps: - name: Download all the dists uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8 with: name: python-package-distributions path: dist/ - name: Publish distribution 📦 to PyPI uses: pypa/gh-action-pypi-publish@76f52bc884231f62b9a034ebfe128415bbaabdfc # v1.12.4 publish-to-testpypi: name: Publish Python 🐍 distribution 📦 to TestPyPI if: needs.build.outputs.is_release == '' needs: - check-repository - build - test-win64-exe runs-on: ubuntu-latest environment: name: testpypi url: https://test.pypi.org/p/pdf2zh permissions: id-token: write steps: - name: Download all the dists uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4.1.8 with: name: python-package-distributions path: dist/ - name: Publish distribution 📦 to TestPyPI uses: pypa/gh-action-pypi-publish@76f52bc884231f62b9a034ebfe128415bbaabdfc # v1.12.4 with: repository-url: https://test.pypi.org/legacy/ build-docker-image: strategy: fail-fast: false matrix: include: - platform: linux/amd64 runner: ubuntu-latest - platform: linux/arm64 runner: ubuntu-24.04-arm runs-on: ${{ matrix.runner }} needs: - build - check-repository if: needs.check-repository.outputs.is_main_repo == 'true' environment: name: ${{ needs.build.outputs.is_release != '' && 'pypi' || 'testpypi' }} url: ${{ needs.build.outputs.is_release != '' && 'https://hub.docker.com/r/byaidu/pdf2zh/tags?name=latest' || 'https://hub.docker.com/r/byaidu/pdf2zh/tags?name=dev' }} permissions: contents: read packages: write steps: - name: Convert to lowercase run: | echo "GHCR_REPO_LOWER=$(echo ${{ env.GHCR_REPO }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_ENV - name: Prepare run: | platform=${{ matrix.platform }} echo "PLATFORM_PAIR=${platform//\//-}" >> $GITHUB_ENV - name: Checkout repository uses: actions/checkout@v4 - name: Setup uv with Python 3.12 uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231 # v5.3.1 with: python-version: "3.12" enable-cache: true cache-dependency-glob: "pyproject.toml" - name: Set version from build job if: needs.build.outputs.is_release == '' run: | uv tool install bumpver echo "Using version: ${{ needs.build.outputs.version }}" bumpver update --set-version ${{ needs.build.outputs.version }} - name: Docker meta id: meta uses: docker/metadata-action@v5 with: images: | ${{ env.DOCKERHUB_REPO }} ${{ env.GHCR_REPO_LOWER }} tags: | type=raw,value=dev type=raw,value=${{ needs.build.outputs.version }},enable=${{ needs.build.outputs.is_release != '' }} type=raw,value=latest,enable=${{ needs.build.outputs.is_release != '' }} - name: Login to Docker.io uses: docker/login-action@v3 with: registry: docker.io username: ${{ secrets.DOCKER_USERNAME }} password: ${{ secrets.DOCKER_PASSWORD }} - name: Login to GHCR uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.repository_owner }} password: ${{ secrets.GITHUB_TOKEN }} - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Build and push by digest id: build uses: docker/build-push-action@v6 with: platforms: ${{ matrix.platform }} labels: ${{ steps.meta.outputs.labels }} outputs: type=image,"name=${{ env.DOCKERHUB_REPO }},${{ env.GHCR_REPO_LOWER }}",push-by-digest=true,name-canonical=true,push=true cache-from: ${{ matrix.platform == 'linux/amd64' && 'type=gha' || '' }} cache-to: ${{ matrix.platform == 'linux/amd64' && 'type=gha,mode=max' || '' }} - name: Export digest run: | mkdir -p ${{ runner.temp }}/digests digest="${{ steps.build.outputs.digest }}" touch "${{ runner.temp }}/digests/${digest#sha256:}" - name: Upload digest uses: actions/upload-artifact@v4 with: name: digests-${{ env.PLATFORM_PAIR }} path: ${{ runner.temp }}/digests/* if-no-files-found: error retention-days: 1 merge-docker-image: runs-on: ubuntu-latest permissions: packages: write needs: - build-docker-image - check-repository - test-win64-exe - build if: needs.check-repository.outputs.is_main_repo == 'true' steps: - name: Convert to lowercase run: | echo "GHCR_REPO_LOWER=$(echo ${{ env.GHCR_REPO }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_ENV - name: Download digests uses: actions/download-artifact@v4 with: path: ${{ runner.temp }}/digests pattern: digests-* merge-multiple: true - name: Login to Docker.io uses: docker/login-action@v3 with: registry: docker.io username: ${{ secrets.DOCKER_USERNAME }} password: ${{ secrets.DOCKER_PASSWORD }} - name: Login to GHCR uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.repository_owner }} password: ${{ secrets.GITHUB_TOKEN }} - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Docker meta id: meta uses: docker/metadata-action@v5 with: images: | ${{ env.DOCKERHUB_REPO }} ${{ env.GHCR_REPO_LOWER }} tags: | type=raw,value=dev type=raw,value=${{ needs.build.outputs.version }},enable=${{ needs.build.outputs.is_release != '' && 'true' || 'false' }} type=raw,value=latest,enable=${{ needs.build.outputs.is_release != '' && 'true' || 'false' }} - name: Create manifest list and push working-directory: ${{ runner.temp }}/digests run: | docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \ $(printf '${{ env.DOCKERHUB_REPO }}@sha256:%s ' *) docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \ $(printf '${{ env.GHCR_REPO_LOWER }}@sha256:%s ' *) - name: Inspect image run: | docker buildx imagetools inspect ${{ env.DOCKERHUB_REPO }}:${{ steps.meta.outputs.version }} docker buildx imagetools inspect ${{ env.GHCR_REPO_LOWER }}:${{ steps.meta.outputs.version }} build-win64-exe: runs-on: windows-latest needs: - check-repository if: needs.check-repository.outputs.is_main_repo == 'true' steps: - name: checkout babeldoc metadata uses: actions/checkout@v4 with: repository: funstory-ai/BabelDOC path: babeldoctemp1234567 token: ${{ secrets.GITHUB_TOKEN }} sparse-checkout: babeldoc/assets/embedding_assets_metadata.py - name: Cached Assets id: cache-assets uses: actions/cache@v4.2.2 with: path: ~/.cache/babeldoc key: test-1-babeldoc-assets-${{ hashFiles('babeldoctemp1234567/babeldoc/assets/embedding_assets_metadata.py') }} - name: 检出代码 uses: actions/checkout@v4 - name: Setup uv with Python ${{ env.WIN_EXE_PYTHON_VERSION }} uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231 # v5.3.1 with: python-version: ${{ env.WIN_EXE_PYTHON_VERSION }} enable-cache: true cache-dependency-glob: "pyproject.toml" - name: 执行所有任务(创建目录、下载、解压、复制文件、安装依赖) shell: pwsh run: | Write-Host "==== 创建必要的目录 ====" New-Item -Path "./build" -ItemType Directory -Force New-Item -Path "./build/runtime" -ItemType Directory -Force New-Item -Path "./dep_build" -ItemType Directory -Force Write-Host "==== 复制代码到 dep_build ====" Get-ChildItem -Path "./" -Exclude "dep_build", "build" | Copy-Item -Destination "./dep_build" -Recurse -Force Write-Host "==== 下载并解压 Python ${{ env.WIN_EXE_PYTHON_VERSION }} ====" Write-Host "pythonUrl: https://www.python.org/ftp/python/${{ env.WIN_EXE_PYTHON_VERSION }}/python-${{ env.WIN_EXE_PYTHON_VERSION }}-embed-amd64.zip" $pythonUrl = "https://www.python.org/ftp/python/${{ env.WIN_EXE_PYTHON_VERSION }}/python-${{ env.WIN_EXE_PYTHON_VERSION }}-embed-amd64.zip" $pythonZip = "./dep_build/python.zip" Invoke-WebRequest -Uri $pythonUrl -OutFile $pythonZip Expand-Archive -Path $pythonZip -DestinationPath "./build/runtime" -Force Write-Host "==== 下载 Visual C++ Redistributable 安装程序 ====" $vcRedistUrl = "https://aka.ms/vs/17/release/vc_redist.x64.exe" $vcRedistPath = "./build/无法运行请安装vc_redist.x64.exe" Invoke-WebRequest -Uri $vcRedistUrl -OutFile $vcRedistPath Write-Host "已下载 Visual C++ Redistributable 安装程序到: $vcRedistPath" Write-Host "==== 下载并解压 PyStand ====" $pystandUrl = "https://github.com/skywind3000/PyStand/releases/download/1.1.4/PyStand-v1.1.4-exe.zip" $pystandZip = "./dep_build/PyStand.zip" Invoke-WebRequest -Uri $pystandUrl -OutFile $pystandZip Expand-Archive -Path $pystandZip -DestinationPath "./dep_build/PyStand" -Force Write-Host "==== 复制 PyStand.exe 到 build 并重命名 ====" $pystandExe = "./dep_build/PyStand/PyStand-x64-CLI/PyStand.exe" $destExe = "./build/pdf2zh.exe" if (Test-Path $pystandExe) { Copy-Item -Path $pystandExe -Destination $destExe -Force } else { Write-Host "错误: PyStand.exe 未找到!" exit 1 } Write-Host "==== 创建 Python venv 在 dep_build ====" uv venv ./dep_build/venv ./dep_build/venv/Scripts/activate Write-Host "==== 在 venv 环境中安装项目依赖 ====" uv pip install . Write-Host "==== 复制 venv/Lib/site-packages 到 build/ ====" Copy-Item -Path "./dep_build/venv/Lib/site-packages" -Destination "./build/site-packages" -Recurse -Force Write-Host "==== 复制 script/_pystand_static.int 到 build/ ====" $staticFile = "./script/_pystand_static.int" $destStatic = "./build/_pystand_static.int" if (Test-Path $staticFile) { Copy-Item -Path $staticFile -Destination $destStatic -Force } else { Write-Host "错误: script/_pystand_static.int 未找到!" exit 1 } # - name: Upload build artifact # uses: actions/upload-artifact@v4 # with: # name: win64-exe # path: ./build # if-no-files-found: error # compression-level: 1 # include-hidden-files: true - name: Generate offline assets shell: pwsh run: | Write-Host "==== 生成离线资源包 ====" uv run --active babeldoc --generate-offline-assets ./build - name: Upload build with offline assets artifact uses: actions/upload-artifact@v4 with: name: win64-exe-with-assets path: ./build if-no-files-found: error compression-level: 1 include-hidden-files: true test-win64-exe: needs: - build-win64-exe - check-repository if: needs.check-repository.outputs.is_main_repo == 'true' runs-on: windows-latest steps: - name: 检出代码 uses: actions/checkout@v4 - name: Download build artifact uses: actions/download-artifact@v4 with: name: win64-exe-with-assets path: ./build - name: Test show version run: | ./build/pdf2zh.exe --version - name: Test - Translate a PDF file with plain text only run: | ./build/pdf2zh.exe ./test/file/translate.cli.plain.text.pdf -o ./test/file - name: Test - Translate a PDF file figure run: | ./build/pdf2zh.exe ./test/file/translate.cli.text.with.figure.pdf -o ./test/file - name: Delete offline assets and cache shell: pwsh run: | Write-Host "==== 查找并删除离线资源包 ====" $offlineAssetsPath = Get-ChildItem -Path "./build" -Filter "offline_assets_*.zip" -Recurse | Select-Object -First 1 -ExpandProperty FullName if ($offlineAssetsPath) { Write-Host "找到离线资源包: $offlineAssetsPath" Remove-Item -Path $offlineAssetsPath -Force Write-Host "已删除离线资源包" } else { Write-Host "未找到离线资源包" } Write-Host "==== 删除缓存目录 ====" $cachePath = "$env:USERPROFILE/.cache/babeldoc" if (Test-Path $cachePath) { Remove-Item -Path $cachePath -Recurse -Force Write-Host "已删除缓存目录: $cachePath" } else { Write-Host "缓存目录不存在: $cachePath" } - name: Test - Translate without offline assets run: | Write-Host "==== 测试离线资源包 ====" New-Item -Path "./test/file/offline_result" -ItemType Directory -Force ./build/pdf2zh.exe ./test/file/translate.cli.plain.text.pdf -o ./test/file/offline_result - name: Upload test results uses: actions/upload-artifact@v4 with: name: test-results path: ./test/file/ release-draft: name: Release Draft Tasks needs: - check-repository - build - publish-to-pypi - publish-to-testpypi - merge-docker-image - test-win64-exe if: | always() && needs.check-repository.outputs.is_main_repo == 'true' && (needs.publish-to-pypi.result == 'success' || needs.publish-to-testpypi.result == 'success') && needs.merge-docker-image.result == 'success' && needs.test-win64-exe.result == 'success' runs-on: ubuntu-latest permissions: contents: write pull-requests: write outputs: tag_name: ${{ steps.release-drafter.outputs.tag_name }} steps: - uses: actions/checkout@v4 with: persist-credentials: true fetch-depth: 2 token: ${{ secrets.GITHUB_TOKEN }} - name: Publish the release notes id: release-drafter uses: release-drafter/release-drafter@b1476f6e6eb133afa41ed8589daba6dc69b4d3f5 # v6.1.0 with: publish: ${{ needs.build.outputs.is_release != '' }} tag: ${{ needs.build.outputs.is_release }} env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} upload-release: needs: [release-draft, check-repository] runs-on: ubuntu-latest if: always() && needs.check-repository.outputs.is_main_repo == 'true' && needs.release-draft.result == 'success' steps: - name: 检出代码 uses: actions/checkout@v4 - name: Download build artifact uses: actions/download-artifact@v4 with: name: win64-exe-with-assets path: ./build - name: Create release zip run: | mv ./build ./pdf2zh zip -9qr "pdf2zh-${{ needs.release-draft.outputs.tag_name }}-with-assets-win64.zip" ./pdf2zh/* # 查找并删除离线资源文件 find ./pdf2zh -name "offline_assets_*.zip" -type f -print -delete # 确保删除操作成功 echo "Remaining offline assets files (should be empty):" find ./pdf2zh -name "offline_assets_*.zip" -type f zip -9qr "pdf2zh-${{ needs.release-draft.outputs.tag_name }}-win64.zip" ./pdf2zh/* - name: Upload to latest release env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | # Get the latest release (including drafts and pre-releases) LATEST_RELEASE=${{ needs.release-draft.outputs.tag_name }} echo "Latest release tag: $LATEST_RELEASE" # Upload the zip file to the release gh release upload "$LATEST_RELEASE" "pdf2zh-${{ needs.release-draft.outputs.tag_name }}-win64.zip" --clobber gh release upload "$LATEST_RELEASE" "pdf2zh-${{ needs.release-draft.outputs.tag_name }}-with-assets-win64.zip" --clobber ``` ## /.github/workflows/python-test.yml ```yml path="/.github/workflows/python-test.yml" name: Test and Build Python Package on: push: branches: - '**' - '!main' - '!master' pull_request: workflow_call: jobs: build-and-test: runs-on: ${{ matrix.runner }} strategy: fail-fast: false matrix: python-version: ["3.10", "3.11", "3.12"] runner: - ubuntu-latest - ubuntu-24.04-arm steps: - name: checkout babeldoc metadata uses: actions/checkout@v4 with: repository: funstory-ai/BabelDOC path: babeldoctemp1234567 token: ${{ secrets.GITHUB_TOKEN }} sparse-checkout: babeldoc/assets/embedding_assets_metadata.py - name: Cached Assets id: cache-assets uses: actions/cache@v4.2.2 with: path: ~/.cache/babeldoc key: test-1-babeldoc-assets-${{ hashFiles('babeldoctemp1234567/babeldoc/assets/embedding_assets_metadata.py') }} - uses: actions/checkout@v4 - name: Setup uv with Python ${{ matrix.python-version }} uses: astral-sh/setup-uv@f94ec6bedd8674c4426838e6b50417d36b6ab231 # v5.3.1 with: python-version: ${{ matrix.python-version }} enable-cache: true cache-dependency-glob: "pyproject.toml" - name: Install dependencies run: | uv sync - name: Test - Unit Test run: | uv run pytest . - name: Test - Translate a PDF file with plain text only run: uv run pdf2zh ./test/file/translate.cli.plain.text.pdf -o ./test/file - name: Test - Translate a PDF file figure run: uv run pdf2zh ./test/file/translate.cli.text.with.figure.pdf -o ./test/file # - name: Test - Translate a PDF file with unknown font # run: # pdf2zh ./test/file/translate.cli.font.unknown.pdf - name: Test - Start GUI and exit run: timeout 10 uv run pdf2zh -i || code=$?; if [[ $code -ne 124 && $code -ne 0 ]]; then exit $code; fi - name: Build as a package run: uv build - name: Upload test results uses: actions/upload-artifact@v4 with: name: test-results-${{ matrix.python-version }}-${{ matrix.runner }} path: ./test/file/ ``` ## /.gitignore ```gitignore path="/.gitignore" pdf2zh_files gui/pdf2zh_files gradio_files tmp gui/gradio_files gui/tmp # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ cover/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder .pybuilder/ target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv # For a library or package, you might want to ignore these files since the code is # intended to run in multiple environments; otherwise, check them in: # .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # poetry # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. # This is especially recommended for binary packages to ensure reproducibility, and is more # commonly ignored for libraries. # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control #poetry.lock # pdm # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. #pdm.lock # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it # in version control. # https://pdm.fming.dev/latest/usage/project/#working-with-version-control .pdm.toml .pdm-python .pdm-build/ # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ pdf2zh-dev/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ # pytype static type analyzer .pytype/ # Cython debug symbols cython_debug/ # PyCharm # JetBrains specific template is maintained in a separate JetBrains.gitignore that can # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. .idea/ .vscode .DS_Store uv.lock *.pdf *.docx ``` ## /.pre-commit-config.yaml ```yaml path="/.pre-commit-config.yaml" # See https://pre-commit.com for more information # See https://pre-commit.com/hooks.html for more hooks files: '^.*\.py$' repos: - repo: local hooks: - id: black name: black entry: black --check --diff --color language: python - id: flake8 name: flake8 entry: flake8 --ignore E203,E261,E501,W503,E741 language: python ``` ## /Dockerfile ``` path="/Dockerfile" FROM ghcr.io/astral-sh/uv:python3.12-bookworm-slim WORKDIR /app EXPOSE 7860 ENV PYTHONUNBUFFERED=1 # # Download all required fonts # ADD "https://github.com/satbyy/go-noto-universal/releases/download/v7.0/GoNotoKurrent-Regular.ttf" /app/ # ADD "https://github.com/timelic/source-han-serif/releases/download/main/SourceHanSerifCN-Regular.ttf" /app/ # ADD "https://github.com/timelic/source-han-serif/releases/download/main/SourceHanSerifTW-Regular.ttf" /app/ # ADD "https://github.com/timelic/source-han-serif/releases/download/main/SourceHanSerifJP-Regular.ttf" /app/ # ADD "https://github.com/timelic/source-han-serif/releases/download/main/SourceHanSerifKR-Regular.ttf" /app/ RUN apt-get update && \ apt-get install --no-install-recommends -y libgl1 libglib2.0-0 libxext6 libsm6 libxrender1 && \ rm -rf /var/lib/apt/lists/* COPY pyproject.toml . RUN uv pip install --system --no-cache -r pyproject.toml && babeldoc --version && babeldoc --warmup COPY . . RUN uv pip install --system --no-cache . && uv pip install --system --no-cache -U babeldoc "pymupdf<1.25.3" && babeldoc --version && babeldoc --warmup CMD ["pdf2zh", "-i"] ``` ## /LICENSE ``` path="/LICENSE" GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU Affero General Public License is a free, copyleft license for software and other kinds of works, specifically designed to ensure cooperation with the community in the case of network server software. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, our General Public Licenses are intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. Developers that use our General Public Licenses protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License which gives you legal permission to copy, distribute and/or modify the software. A secondary benefit of defending all users' freedom is that improvements made in alternate versions of the program, if they receive widespread use, become available for other developers to incorporate. Many developers of free software are heartened and encouraged by the resulting cooperation. However, in the case of software used on network servers, this result may fail to come about. The GNU General Public License permits making a modified version and letting the public access it on a server without ever releasing its source code to the public. The GNU Affero General Public License is designed specifically to ensure that, in such cases, the modified source code becomes available to the community. It requires the operator of a network server to provide the source code of the modified version running there to the users of that server. Therefore, public use of a modified version, on a publicly accessible server, gives the public access to the source code of the modified version. An older license, called the Affero General Public License and published by Affero, was designed to accomplish similar goals. This is a different license, not a version of the Affero GPL, but Affero has released a new version of the Affero GPL which permits relicensing under this license. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU Affero General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Remote Network Interaction; Use with the GNU General Public License. Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software. This Corresponding Source shall include the Corresponding Source for any work covered by version 3 of the GNU General Public License that is incorporated pursuant to the following paragraph. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the work with which it is combined will remain governed by version 3 of the GNU General Public License. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU Affero General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU Affero General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU Affero General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU Affero General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If your software can interact with users remotely through a computer network, you should also make sure that it provides a way for users to get its source. For example, if your program is a web application, its interface could display a "Source" link that leads users to an archive of the code. There are many ways you could offer source, and different solutions will be better for different programs; see section 13 for the specific requirements. You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU AGPL, see . ``` ## /README.md
English | [简体中文](docs/README_zh-CN.md) | [繁體中文](docs/README_zh-TW.md) | [日本語](docs/README_ja-JP.md) | [한국어](docs/README_ko-KR.md) PDF2ZH

PDFMathTranslate

Byaidu%2FPDFMathTranslate | Trendshift
PDF scientific paper translation and bilingual comparison. - 📊 Preserve formulas, charts, table of contents, and annotations _([preview](#preview))_. - 🌐 Support [multiple languages](#language), and diverse [translation services](#services). - 🤖 Provides [commandline tool](#usage), [interactive user interface](#gui), and [Docker](#docker) Feel free to provide feedback in [GitHub Issues](https://github.com/Byaidu/PDFMathTranslate/issues) or [Telegram Group](https://t.me/+Z9_SgnxmsmA5NzBl). For details on how to contribute, please consult the [Contribution Guide](https://github.com/Byaidu/PDFMathTranslate/wiki/Contribution-Guide---%E8%B4%A1%E7%8C%AE%E6%8C%87%E5%8D%97).

Updates

- [Mar. 3, 2025] Experimental support for the new backend [BabelDOC](https://github.com/funstory-ai/BabelDOC) WebUI added as an experimental option (by [@awwaawwa](https://github.com/awwaawwa)) - [Feb. 22 2025] Better release CI and well-packaged windows-amd64 exe (by [@awwaawwa](https://github.com/awwaawwa)) - [Dec. 24 2024] The translator now supports local models on [Xinference](https://github.com/xorbitsai/inference) _(by [@imClumsyPanda](https://github.com/imClumsyPanda))_ - [Dec. 19 2024] Non-PDF/A documents are now supported using `-cp` _(by [@reycn](https://github.com/reycn))_ - [Dec. 13 2024] Additional support for backend by _(by [@YadominJinta](https://github.com/YadominJinta))_ - [Dec. 10 2024] The translator now supports OpenAI models on Azure _(by [@yidasanqian](https://github.com/yidasanqian))_

Preview

Online Service 🌟

You can try our application out using either of the following demos: - [Public free service](https://pdf2zh.com/) online without installation _(recommended)_. - [Immersive Translate - BabelDOC](https://app.immersivetranslate.com/babel-doc/) 1000 free pages per month. _(recommended)_ - [Demo hosted on HuggingFace](https://huggingface.co/spaces/reycn/PDFMathTranslate-Docker) - [Demo hosted on ModelScope](https://www.modelscope.cn/studios/AI-ModelScope/PDFMathTranslate) without installation. Note that the computing resources of the demo are limited, so please avoid abusing them.

Installation and Usage

### Methods For different use cases, we provide distinct methods to use our program:
1. UV install 1. Python installed (3.10 <= version <= 3.12) 2. Install our package: ```bash pip install uv uv tool install --python 3.12 pdf2zh ``` 3. Execute translation, files generated in [current working directory](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444): ```bash pdf2zh document.pdf ```
2. Windows exe 1. Download pdf2zh-version-win64.zip from [release page](https://github.com/Byaidu/PDFMathTranslate/releases) 2. Unzip and double-click `pdf2zh.exe` to run.
3. Graphic user interface 1. Python installed (3.10 <= version <= 3.12) 2. Install our package: ```bash pip install pdf2zh ``` 3. Start using in browser: ```bash pdf2zh -i ``` 4. If your browswer has not been started automatically, goto ```bash http://localhost:7860/ ``` See [documentation for GUI](./docs/README_GUI.md) for more details.
4. Docker 1. Pull and run: ```bash docker pull byaidu/pdf2zh docker run -d -p 7860:7860 byaidu/pdf2zh ``` 2. Open in browser: ``` http://localhost:7860/ ``` For docker deployment on cloud service:
5. Zotero Plugin See [Zotero PDF2zh](https://github.com/guaguastandup/zotero-pdf2zh) for more details.
6. Commandline 1. Python installed (3.10 <= version <= 3.12) 2. Install our package: ```bash pip install pdf2zh ``` 3. Execute translation, files generated in [current working directory](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444): ```bash pdf2zh document.pdf ```
> [!TIP] > > - If you're using Windows and cannot open the file after downloading, please install [vc_redist.x64.exe](https://aka.ms/vs/17/release/vc_redist.x64.exe) and try again. > > - If you cannot access Docker Hub, please try the image on [GitHub Container Registry](https://github.com/Byaidu/PDFMathTranslate/pkgs/container/pdfmathtranslate). > ```bash > docker pull ghcr.io/byaidu/pdfmathtranslate > docker run -d -p 7860:7860 ghcr.io/byaidu/pdfmathtranslate > ``` ### Unable to install? The present program needs an AI model(`wybxc/DocLayout-YOLO-DocStructBench-onnx`) before working and some users are not able to download due to network issues. If you have a problem with downloading this model, we provide a workaround using the following environment variable: ```shell set HF_ENDPOINT=https://hf-mirror.com ``` For PowerShell user: ```shell $env:HF_ENDPOINT = https://hf-mirror.com ``` If the solution does not work to you / you encountered other issues, please refer to [frequently asked questions](https://github.com/Byaidu/PDFMathTranslate/wiki#-faq--%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98).

Advanced Options

Execute the translation command in the command line to generate the translated document `example-mono.pdf` and the bilingual document `example-dual.pdf` in the current working directory. Use Google as the default translation service. More support translation services can find [HERE](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#services). cmd In the following table, we list all advanced options for reference: | Option | Function | Example | | --------------------- | ------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | | files | Local files | `pdf2zh ~/local.pdf` | | links | Online files | `pdf2zh http://arxiv.org/paper.pdf` | | `-i` | [Enter GUI](#gui) | `pdf2zh -i` | | `-p` | [Partial document translation](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#partial) | `pdf2zh example.pdf -p 1` | | `-li` | [Source language](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#languages) | `pdf2zh example.pdf -li en` | | `-lo` | [Target language](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#languages) | `pdf2zh example.pdf -lo zh` | | `-s` | [Translation service](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#services) | `pdf2zh example.pdf -s deepl` | | `-t` | [Multi-threads](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#threads) | `pdf2zh example.pdf -t 1` | | `-o` | Output dir | `pdf2zh example.pdf -o output` | | `-f`, `-c` | [Exceptions](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#exceptions) | `pdf2zh example.pdf -f "(MS.*)"` | | `-cp` | Compatibility Mode | `pdf2zh example.pdf --compatible` | | `--skip-subset-fonts` | [Skip font subset](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#font-subset) | `pdf2zh example.pdf --skip-subset-fonts` | | `--ignore-cache` | [Ignore translate cache](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#cache) | `pdf2zh example.pdf --ignore-cache` | | `--share` | Public link | `pdf2zh -i --share` | | `--authorized` | [Authorization](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#auth) | `pdf2zh -i --authorized users.txt [auth.html]` | | `--prompt` | [Custom Prompt](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#prompt) | `pdf2zh --prompt [prompt.txt]` | | `--onnx` | [Use Custom DocLayout-YOLO ONNX model] | `pdf2zh --onnx [onnx/model/path]` | | `--serverport` | [Use Custom WebUI port] | `pdf2zh --serverport 7860` | | `--dir` | [batch translate] | `pdf2zh --dir /path/to/translate/` | | `--config` | [configuration file](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#cofig) | `pdf2zh --config /path/to/config/config.json` | | `--serverport` | [custom gradio server port] | `pdf2zh --serverport 7860` | | `--babeldoc` | Use Experimental backend [BabelDOC](https://funstory-ai.github.io/BabelDOC/) to translate | `pdf2zh --babeldoc` -s openai example.pdf | | `--mcp` | Enable MCP STDIO mode | `pdf2zh --mcp` | | `--sse` | Enable MCP SSE mode | `pdf2zh --mcp --sse` | For detailed explanations, please refer to our document about [Advanced Usage](./docs/ADVANCED.md) for a full list of each option.

Secondary Development (APIs)

For downstream applications, please refer to our document about [API Details](./docs/APIS.md) for futher information about: - [Python API](./docs/APIS.md#api-python), how to use the program in other Python programs - [HTTP API](./docs/APIS.md#api-http), how to communicate with a server with the program installed

TODOs

- [ ] Parse layout with DocLayNet based models, [PaddleX](https://github.com/PaddlePaddle/PaddleX/blob/17cc27ac3842e7880ca4aad92358d3ef8555429a/paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py#L81), [PaperMage](https://github.com/allenai/papermage/blob/9cd4bb48cbedab45d0f7a455711438f1632abebe/README.md?plain=1#L102), [SAM2](https://github.com/facebookresearch/sam2) - [ ] Fix page rotation, table of contents, format of lists - [ ] Fix pixel formula in old papers - [ ] Async retry except KeyboardInterrupt - [ ] Knuth–Plass algorithm for western languages - [ ] Support non-PDF/A files - [ ] Plugins of [Zotero](https://github.com/zotero/zotero) and [Obsidian](https://github.com/obsidianmd/obsidian-releases)

Acknowledgements

- [Immersive Translation](https://immersivetranslate.com) sponsors monthly Pro membership redemption codes for active contributors to this project, see details at: [CONTRIBUTOR_REWARD.md](https://github.com/funstory-ai/BabelDOC/blob/main/docs/CONTRIBUTOR_REWARD.md) - New backend: [BabelDOC](https://github.com/funstory-ai/BabelDOC) - Document merging: [PyMuPDF](https://github.com/pymupdf/PyMuPDF) - Document parsing: [Pdfminer.six](https://github.com/pdfminer/pdfminer.six) - Document extraction: [MinerU](https://github.com/opendatalab/MinerU) - Document Preview: [Gradio PDF](https://github.com/freddyaboulton/gradio-pdf) - Multi-threaded translation: [MathTranslate](https://github.com/SUSYUSTC/MathTranslate) - Layout parsing: [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) - Document standard: [PDF Explained](https://zxyle.github.io/PDF-Explained/), [PDF Cheat Sheets](https://pdfa.org/resource/pdf-cheat-sheets/) - Multilingual Font: [Go Noto Universal](https://github.com/satbyy/go-noto-universal)

Contributors

![Alt](https://repobeats.axiom.co/api/embed/dfa7583da5332a11468d686fbd29b92320a6a869.svg "Repobeats analytics image")

Star History

Star History Chart ## /app.json ```json path="/app.json" { "name": "PDFMathTranslate", "description": "PDF scientific paper translation and bilingual comparison.", "repository": "https://github.com/Byaidu/PDFMathTranslate" } ``` ## /docs/ADVANCED.md [**Documentation**](https://github.com/Byaidu/PDFMathTranslate) > **Advanced Usage** _(current)_ ---

Table of Contents

- [Full / partial translation](#partial) - [Specify source and target languages](#language) - [Translate with different services](#services) - [Translate wih exceptions](#exceptions) - [Multi-threads](#threads) - [Custom prompt](#prompt) - [Authorization](#auth) - [Custom configuration file](#cofig) - [Fonts Subseting](#fonts-subset) - [Translation cache](#cache) ---

Full / partial translation

- Entire document ```bash pdf2zh example.pdf ``` - Part of the document ```bash pdf2zh example.pdf -p 1-3,5 ``` [⬆️ Back to top](#toc) ---

Specify source and target languages

See [Google Languages Codes](https://developers.google.com/admin-sdk/directory/v1/languages), [DeepL Languages Codes](https://developers.deepl.com/docs/resources/supported-languages) ```bash pdf2zh example.pdf -li en -lo ja ``` [⬆️ Back to top](#toc) ---

Translate with different services

We've provided a detailed table on the required [environment variables](https://chatgpt.com/share/6734a83d-9d48-800e-8a46-f57ca6e8bcb4) for each translation service. Make sure to set them before using the respective service. | **Translator** | **Service** | **Environment Variables** | **Default Values** | **Notes** | |----------------------|----------------|-----------------------------------------------------------------------|----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | **Google (Default)** | `google` | None | N/A | None | | **Bing** | `bing` | None | N/A | None | | **DeepL** | `deepl` | `DEEPL_AUTH_KEY` | `[Your Key]` | See [DeepL](https://support.deepl.com/hc/en-us/articles/360020695820-API-Key-for-DeepL-s-API) | | **DeepLX** | `deeplx` | `DEEPLX_ENDPOINT` | `https://api.deepl.com/translate` | See [DeepLX](https://github.com/OwO-Network/DeepLX) | | **Ollama** | `ollama` | `OLLAMA_HOST`, `OLLAMA_MODEL` | `http://127.0.0.1:11434`, `gemma2` | See [Ollama](https://github.com/ollama/ollama) | | **Xinference** | `xinference` | `XINFERENCE_HOST`, `XINFERENCE_MODEL` | `http://127.0.0.1:9997`, `gemma-2-it` | See [Xinference](https://github.com/xorbitsai/inference) | | **OpenAI** | `openai` | `OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL` | `https://api.openai.com/v1`, `[Your Key]`, `gpt-4o-mini` | See [OpenAI](https://platform.openai.com/docs/overview) | | **AzureOpenAI** | `azure-openai` | `AZURE_OPENAI_BASE_URL`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_MODEL` | `[Your Endpoint]`, `[Your Key]`, `gpt-4o-mini` | See [Azure OpenAI](https://learn.microsoft.com/zh-cn/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Cjavascript-keyless%2Ctypescript-keyless%2Cpython&pivots=programming-language-python) | | **Zhipu** | `zhipu` | `ZHIPU_API_KEY`, `ZHIPU_MODEL` | `[Your Key]`, `glm-4-flash` | See [Zhipu](https://open.bigmodel.cn/dev/api/thirdparty-frame/openai-sdk) | | **ModelScope** | `ModelScope` | `MODELSCOPE_API_KEY`, `MODELSCOPE_MODEL` | `[Your Key]`, `Qwen/Qwen2.5-Coder-32B-Instruct` | See [ModelScope](https://www.modelscope.cn/docs/model-service/API-Inference/intro) | | **Silicon** | `silicon` | `SILICON_API_KEY`, `SILICON_MODEL` | `[Your Key]`, `Qwen/Qwen2.5-7B-Instruct` | See [SiliconCloud](https://docs.siliconflow.cn/quickstart) | | **Gemini** | `gemini` | `GEMINI_API_KEY`, `GEMINI_MODEL` | `[Your Key]`, `gemini-1.5-flash` | See [Gemini](https://ai.google.dev/gemini-api/docs/openai) | | **Azure** | `azure` | `AZURE_ENDPOINT`, `AZURE_API_KEY` | `https://api.translator.azure.cn`, `[Your Key]` | See [Azure](https://docs.azure.cn/en-us/ai-services/translator/text-translation-overview) | | **Tencent** | `tencent` | `TENCENTCLOUD_SECRET_ID`, `TENCENTCLOUD_SECRET_KEY` | `[Your ID]`, `[Your Key]` | See [Tencent](https://www.tencentcloud.com/products/tmt?from_qcintl=122110104) | | **Dify** | `dify` | `DIFY_API_URL`, `DIFY_API_KEY` | `[Your DIFY URL]`, `[Your Key]` | See [Dify](https://github.com/langgenius/dify),Three variables, lang_out, lang_in, and text, need to be defined in Dify's workflow input. | | **AnythingLLM** | `anythingllm` | `AnythingLLM_URL`, `AnythingLLM_APIKEY` | `[Your AnythingLLM URL]`, `[Your Key]` | See [anything-llm](https://github.com/Mintplex-Labs/anything-llm) | |**Argos Translate**|`argos`| | |See [argos-translate](https://github.com/argosopentech/argos-translate)| |**Grok**|`grok`| `GORK_API_KEY`, `GORK_MODEL` | `[Your GORK_API_KEY]`, `grok-2-1212` |See [Grok](https://docs.x.ai/docs/overview)| |**Groq**|`groq`| `GROQ_API_KEY`, `GROQ_MODEL` | `[Your GROQ_API_KEY]`, `llama-3-3-70b-versatile` |See [Groq](https://console.groq.com/docs/models)| |**DeepSeek**|`deepseek`| `DEEPSEEK_API_KEY`, `DEEPSEEK_MODEL` | `[Your DEEPSEEK_API_KEY]`, `deepseek-chat` |See [DeepSeek](https://www.deepseek.com/)| |**OpenAI-Liked**|`openailiked`| `OPENAILIKED_BASE_URL`, `OPENAILIKED_API_KEY`, `OPENAILIKED_MODEL` | `url`, `[Your Key]`, `model name` | None | |**Ali Qwen Translation**|`qwen-mt`| `ALI_MODEL`, `ALI_API_KEY`, `ALI_DOMAINS` | `qwen-mt-turbo`, `[Your Key]`, `scientific paper` | Tranditional Chinese are not yet supported, it will be translated into Simplified Chinese. More see [Qwen MT](https://bailian.console.aliyun.com/?spm=5176.28197581.0.0.72e329a4HRxe99#/model-market/detail/qwen-mt-turbo) | For large language models that are compatible with the OpenAI API but not listed in the table above, you can set environment variables using the same method outlined for OpenAI in the table. Use `-s service` or `-s service:model` to specify service: ```bash pdf2zh example.pdf -s openai:gpt-4o-mini ``` Or specify model with environment variables: ```bash set OPENAI_MODEL=gpt-4o-mini pdf2zh example.pdf -s openai ``` For PowerShell user: ```shell $env:OPENAI_MODEL = gpt-4o-mini pdf2zh example.pdf -s openai ``` [⬆️ Back to top](#toc) ---

Translate wih exceptions

Use regex to specify formula fonts and characters that need to be preserved: ```bash pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])" ``` Preserve `Latex`, `Mono`, `Code`, `Italic`, `Symbol` and `Math` fonts by default: ```bash pdf2zh example.pdf -f "(CM[^R]|MS.M|XY|MT|BL|RM|EU|LA|RS|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)" ``` [⬆️ Back to top](#toc) ---

Multi-threads

Use `-t` to specify how many threads to use in translation: ```bash pdf2zh example.pdf -t 1 ``` [⬆️ Back to top](#toc) ---

Custom prompt

Note: System prompt is currently not supported. See [this change](https://github.com/Byaidu/PDFMathTranslate/pull/637). Use `--prompt` to specify which prompt to use in llm: ```bash pdf2zh example.pdf --prompt prompt.txt ``` For example: ```txt You are a professional, authentic machine translation engine. Only Output the translated text, do not include any other text. Translate the following markdown source text to ${lang_out}. Keep the formula notation {v*} unchanged. Output translation directly without any additional text. Source Text: ${text} Translated Text: ``` In custom prompt file, there are three variables can be used. |**variables**|**comment**| |-|-| |`lang_in`|input language| |`lang_out`|output language| |`text`|text need to be translated| [⬆️ Back to top](#toc) ---

Authorization

Use `--authorized` to specify which user to use Web UI and custom the login page: ```bash pdf2zh example.pdf --authorized users.txt auth.html ``` example users.txt Each line contains two elements, username, and password, separated by a comma. ``` admin,123456 user1,password1 user2,abc123 guest,guest123 test,test123 ``` example auth.html ```html Simple HTML

Hello, World!

Welcome to my simple HTML page.

``` [⬆️ Back to top](#toc) ---

Custom configuration file

Use `--config` to specify which file to configure the PDFMathTranslate: ```bash pdf2zh example.pdf --config config.json ``` ```bash pdf2zh -i --config config.json ``` example config.json ```json { "USE_MODELSCOPE": "0", "PDF2ZH_LANG_FROM": "English", "PDF2ZH_LANG_TO": "Simplified Chinese", "NOTO_FONT_PATH": "/app/SourceHanSerifCN-Regular.ttf", "translators": [ { "name": "deeplx", "envs": { "DEEPLX_ENDPOINT": "http://localhost:1188/translate/", "DEEPLX_ACCESS_TOKEN": null } }, { "name": "ollama", "envs": { "OLLAMA_HOST": "http://127.0.0.1:11434", "OLLAMA_MODEL": "gemma2" } } ] } ``` By default, the config file is saved in the `~/.config/PDFMathTranslate/config.json`. The program will start by reading the contents of config.json, and after that it will read the contents of the environment variables. When an environment variable is available, the contents of the environment variable are used first and the file is updated. [⬆️ Back to top](#toc) ---

Fonts subsetting

By default, PDFMathTranslate uses fonts subsetting to decrease sizes of output files. You can use `--skip-subset-fonts` option to disable fonts subsetting when encoutering compatibility issues. ```bash pdf2zh example.pdf --skip-subset-fonts ``` [⬆️ Back to top](#toc) ---

Translation cache

PDFMathTranslate caches translated texts to increase speed and avoid unnecessary API calls for same contents. You can use `--ignore-cache` option to ignore translation cache and force retranslation. ```bash pdf2zh example.pdf --ignore-cache ``` [⬆️ Back to top](#toc) ---

Deployment as a public services

PDFMathTranslate has added the features of **enabling partial services** and **hiding Backend information** in the configuration file. You can enable these by setting `ENABLED_SERVICES` and `HIDDEN_GRADIO_DETAILS` in the configuration file. Among them: - `ENABLED_SERVICES` allows you to choose to enable only certain options, limiting the number of available services. - `HIDDEN_GRADIO_DETAILS` will hide the real API_KEY on the web, preventing users from obtaining server-side keys. A usable configuration is as follows: ```json { "USE_MODELSCOPE": "0", "translators": [ { "name": "grok", "envs": { "GORK_API_KEY": null, "GORK_MODEL": "grok-2-1212" } }, { "name": "openai", "envs": { "OPENAI_BASE_URL": "https://api.openai.com/v1", "OPENAI_API_KEY": "sk-xxxx", "OPENAI_MODEL": "gpt-4o-mini" } } ], "ENABLED_SERVICES": [ "OpenAI", "Grok" ], "HIDDEN_GRADIO_DETAILS": true, "PDF2ZH_LANG_FROM": "English", "PDF2ZH_LANG_TO": "Simplified Chinese", "NOTO_FONT_PATH": "/app/SourceHanSerifCN-Regular.ttf" } ``` [⬆️ Back to top](#toc) ---

MCP

PDFMathTranslate can run as MCP server. To use this, you need to run `uv pip install pdf2zh`, and config `claude_desktop_config.json`, an example config is as follows: ``` json { "mcpServers": { "filesystem": { "command": "npx", "args": [ "-y", "@modelcontextprotocol/server-filesystem", "/path/to/Document" ] }, "translate_pdf": { "command": "uv", "args": [ "run", "pdf2zh", "--mcp" ] } } } ``` [filesystem](https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem) is a reuqired mcp server to find pdf file, and `translate_pdf` is our mcp server. To test if the mcp server works, you can open claude desktop and tell ``` find the `test.pdf` in my Document folder and translate it to Chinese ``` ## /docs/APIS.md [**Documentation**](https://github.com/Byaidu/PDFMathTranslate) > **API Details** _(current)_

Table of Content

The present project supports two types of APIs, All methods need the Redis; - [Functional calls in Python](#api-python) - [HTTP protocols](#api-http) ---

Python

As `pdf2zh` is an installed module in Python, we expose two methods for other programs to call in any Python scripts. For example, if you want translate a document from English to Chinese using Google Translate, you may use the following code: ```python from pdf2zh import translate, translate_stream params = { 'lang_in': 'en', 'lang_out': 'zh', 'service': 'google', 'thread': 4, } ``` Translate with files: ```python (file_mono, file_dual) = translate(files=['example.pdf'], **params)[0] ``` Translate with stream: ```python with open('example.pdf', 'rb') as f: (stream_mono, stream_dual) = translate_stream(stream=f.read(), **params) ``` [⬆️ Back to top](#toc) ---

HTTP

In a more flexible way, you can communicate with the program using HTTP protocols, if: 1. Install and run backend ```bash pip install pdf2zh[backend] pdf2zh --flask pdf2zh --celery worker ``` 2. Using HTTP protocols as follows: - Submit translate task ```bash curl http://localhost:11008/v1/translate -F "file=@example.pdf" -F "data={\"lang_in\":\"en\",\"lang_out\":\"zh\",\"service\":\"google\",\"thread\":4}" {"id":"d9894125-2f4e-45ea-9d93-1a9068d2045a"} ``` - Check Progress ```bash curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a {"info":{"n":13,"total":506},"state":"PROGRESS"} ``` - Check Progress _(if finished)_ ```bash curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a {"state":"SUCCESS"} ``` - Save monolingual file ```bash curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/mono --output example-mono.pdf ``` - Save bilingual file ```bash curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/dual --output example-dual.pdf ``` - Interrupt if running and delete the task ```bash curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a -X DELETE ``` [⬆️ Back to top](#toc) --- ## /docs/CODE_OF_CONDUCT.md # Contributor Covenant Code of Conduct ## Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. ## Our Standards Examples of behavior that contributes to a positive environment for our community include: * Demonstrating empathy and kindness toward other people * Being respectful of differing opinions, viewpoints, and experiences * Giving and gracefully accepting constructive feedback * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience * Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: * The use of sexualized language or imagery, and sexual attention or advances of any kind * Trolling, insulting or derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or email address, without their explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. ## Scope This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at aw@funstory.ai . All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. ## Enforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: ### 1. Correction **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. ### 2. Warning **Community Impact**: A violation through a single incident or series of actions. **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. ### 3. Temporary Ban **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior. **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. ### 4. Permanent Ban **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. **Consequence**: A permanent ban from any sort of public interaction within the community. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity). [homepage]: https://www.contributor-covenant.org For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations. ## /docs/README_GUI.md # Interact with GUI This subfolder provides the GUI mode of `pdf2zh`. ## Usage 1. Run `pdf2zh -i` 2. Drop the PDF file into the window and click `Translate`. ### Environment Variables You can set the source and target languages using environment variables: - `PDF2ZH_LANG_FROM`: Sets the source language. Defaults to "English". - `PDF2ZH_LANG_TO`: Sets the target language. Defaults to "Simplified Chinese". ### Supported Languages The following languages are supported: - English - Simplified Chinese - Traditional Chinese - French - German - Japanese - Korean - Russian - Spanish - Italian ## Preview ## Maintainance GUI maintained by [Rongxin](https://github.com/reycn) ## /docs/README_ja-JP.md
[English](../README.md) | [简体中文](README_zh-CN.md) | [繁體中文](README_zh-TW.md) | 日本語 PDF2ZH

PDFMathTranslate

Byaidu%2FPDFMathTranslate | Trendshift
科学 PDF 文書の翻訳およびバイリンガル比較ツール - 📊 数式、チャート、目次、注釈を保持 *([プレビュー](#preview))* - 🌐 [複数の言語](#language) と [多様な翻訳サービス](#services) をサポート - 🤖 [コマンドラインツール](#usage)、[インタラクティブユーザーインターフェース](#gui)、および [Docker](#docker) を提供 フィードバックは [GitHub Issues](https://github.com/Byaidu/PDFMathTranslate/issues)、[Telegram グループ](https://t.me/+Z9_SgnxmsmA5NzBl)

最近の更新

- [2024年11月26日] CLIがオンラインファイルをサポートするようになりました *(by [@reycn](https://github.com/reycn))* - [2024年11月24日] 依存関係のサイズを削減するために [ONNX](https://github.com/onnx/onnx) サポートを追加しました *(by [@Wybxc](https://github.com/Wybxc))* - [2024年11月23日] 🌟 [公共サービス](#demo) がオンラインになりました! *(by [@Byaidu](https://github.com/Byaidu))* - [2024年11月23日] ウェブボットを防ぐためのファイアウォールを追加しました *(by [@Byaidu](https://github.com/Byaidu))* - [2024年11月22日] GUIがイタリア語をサポートし、改善されました *(by [@Byaidu](https://github.com/Byaidu), [@reycn](https://github.com/reycn))* - [2024年11月22日] デプロイされたサービスを他の人と共有できるようになりました *(by [@Zxis233](https://github.com/Zxis233))* - [2024年11月22日] Tencent翻訳をサポートしました *(by [@hellofinch](https://github.com/hellofinch))* - [2024年11月21日] GUIがバイリンガルドキュメントのダウンロードをサポートするようになりました *(by [@reycn](https://github.com/reycn))* - [2024年11月20日] 🌟 [デモ](#demo) がオンラインになりました! *(by [@reycn](https://github.com/reycn))*

プレビュー

公共サービス 🌟

### 無料サービス () インストールなしで [公共サービス](https://pdf2zh.com/) をオンラインで試すことができます。 ### デモ インストールなしで [HuggingFace上のデモ](https://huggingface.co/spaces/reycn/PDFMathTranslate-Docker), [ModelScope上のデモ](https://www.modelscope.cn/studios/AI-ModelScope/PDFMathTranslate) を試すことができます。 デモの計算リソースは限られているため、乱用しないようにしてください。

インストールと使用方法

このプロジェクトを使用するための4つの方法を提供しています:[コマンドライン](#cmd)、[ポータブル](#portable)、[GUI](#gui)、および [Docker](#docker)。 pdf2zhの実行には追加モデル(`wybxc/DocLayout-YOLO-DocStructBench-onnx`)が必要です。このモデルはModelScopeでも見つけることができます。起動時にこのモデルのダウンロードに問題がある場合は、以下の環境変数を使用してください: ```shell set HF_ENDPOINT=https://hf-mirror.com ``` For PowerShell user: ```shell $env:HF_ENDPOINT = https://hf-mirror.com ```

方法1. コマンドライン

1. Pythonがインストールされていること (バージョン3.10 <= バージョン <= 3.12) 2. パッケージをインストールします: ```bash pip install pdf2zh ``` 3. 翻訳を実行し、[現在の作業ディレクトリ](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444) にファイルを生成します: ```bash pdf2zh document.pdf ```

方法2. ポータブル

Python環境を事前にインストールする必要はありません [setup.bat](https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/script/setup.bat) をダウンロードしてダブルクリックして実行します

方法3. GUI

1. Pythonがインストールされていること (バージョン3.10 <= バージョン <= 3.12) 2. パッケージをインストールします: ```bash pip install pdf2zh ``` 3. ブラウザで使用を開始します: ```bash pdf2zh -i ``` 4. ブラウザが自動的に起動しない場合は、次のURLを開きます: ```bash http://localhost:7860/ ``` 詳細については、[GUIのドキュメント](./README_GUI.md) を参照してください。

方法4. Docker

1. プルして実行します: ```bash docker pull byaidu/pdf2zh docker run -d -p 7860:7860 byaidu/pdf2zh ``` 2. ブラウザで開きます: ``` http://localhost:7860/ ``` クラウドサービスでのDockerデプロイメント用:

高度なオプション

コマンドラインで翻訳コマンドを実行し、現在の作業ディレクトリに翻訳されたドキュメント `example-mono.pdf` とバイリンガルドキュメント `example-dual.pdf` を生成します。デフォルトではGoogle翻訳サービスを使用します。More support translation services can find [HERE](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#services). cmd 以下の表に、参考のためにすべての高度なオプションをリストしました: | オプション | 機能 | 例 | | -------- | ------- |------- | | files | ローカルファイル | `pdf2zh ~/local.pdf` | | links | オンラインファイル | `pdf2zh http://arxiv.org/paper.pdf` | | `-i` | [GUIに入る](#gui) | `pdf2zh -i` | | `-p` | [部分的なドキュメント翻訳](#partial) | `pdf2zh example.pdf -p 1` | | `-li` | [ソース言語](#languages) | `pdf2zh example.pdf -li en` | | `-lo` | [ターゲット言語](#languages) | `pdf2zh example.pdf -lo zh` | | `-s` | [翻訳サービス](#services) | `pdf2zh example.pdf -s deepl` | | `-t` | [マルチスレッド](#threads) | `pdf2zh example.pdf -t 1` | | `-o` | 出力ディレクトリ | `pdf2zh example.pdf -o output` | | `-f`, `-c` | [例外](#exceptions) | `pdf2zh example.pdf -f "(MS.*)"` | | `--share` | [gradio公開リンクを取得] | `pdf2zh -i --share` | | `--authorized` | [[ウェブ認証とカスタム認証ページの追加](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.)] | `pdf2zh -i --authorized users.txt [auth.html]` | | `--prompt` | [カスタムビッグモデルのプロンプトを使用する] | `pdf2zh --prompt [prompt.txt]` | | `--onnx` | [カスタムDocLayout-YOLO ONNXモデルの使用] | `pdf2zh --onnx [onnx/model/path]` | | `--serverport` | [カスタムWebUIポートを使用する] | `pdf2zh --serverport 7860` | | `--dir` | [batch translate] | `pdf2zh --dir /path/to/translate/` | | `--config` | [configuration file](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#cofig) | `pdf2zh --config /path/to/config/config.json` | | `--serverport` | [custom gradio server port] | `pdf2zh --serverport 7860` |

全文または部分的なドキュメント翻訳

- **全文翻訳** ```bash pdf2zh example.pdf ``` - **部分翻訳** ```bash pdf2zh example.pdf -p 1-3,5 ```

ソース言語とターゲット言語を指定

[Google Languages Codes](https://developers.google.com/admin-sdk/directory/v1/languages)、[DeepL Languages Codes](https://developers.deepl.com/docs/resources/supported-languages) を参照してください ```bash pdf2zh example.pdf -li en -lo ja ```

異なるサービスで翻訳

以下の表は、各翻訳サービスに必要な [環境変数](https://chatgpt.com/share/6734a83d-9d48-800e-8a46-f57ca6e8bcb4) を示しています。各サービスを使用する前に、これらの変数を設定してください。 |**Translator**|**Service**|**Environment Variables**|**Default Values**|**Notes**| |-|-|-|-|-| |**Google (Default)**|`google`|None|N/A|None| |**Bing**|`bing`|None|N/A|None| |**DeepL**|`deepl`|`DEEPL_AUTH_KEY`|`[Your Key]`|See [DeepL](https://support.deepl.com/hc/en-us/articles/360020695820-API-Key-for-DeepL-s-API)| |**DeepLX**|`deeplx`|`DEEPLX_ENDPOINT`|`https://api.deepl.com/translate`|See [DeepLX](https://github.com/OwO-Network/DeepLX)| |**Ollama**|`ollama`|`OLLAMA_HOST`, `OLLAMA_MODEL`|`http://127.0.0.1:11434`, `gemma2`|See [Ollama](https://github.com/ollama/ollama)| |**OpenAI**|`openai`|`OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL`|`https://api.openai.com/v1`, `[Your Key]`, `gpt-4o-mini`|See [OpenAI](https://platform.openai.com/docs/overview)| |**AzureOpenAI**|`azure-openai`|`AZURE_OPENAI_BASE_URL`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_MODEL`|`[Your Endpoint]`, `[Your Key]`, `gpt-4o-mini`|See [Azure OpenAI](https://learn.microsoft.com/zh-cn/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Cjavascript-keyless%2Ctypescript-keyless%2Cpython&pivots=programming-language-python)| |**Zhipu**|`zhipu`|`ZHIPU_API_KEY`, `ZHIPU_MODEL`|`[Your Key]`, `glm-4-flash`|See [Zhipu](https://open.bigmodel.cn/dev/api/thirdparty-frame/openai-sdk)| | **ModelScope** | `modelscope` |`MODELSCOPE_API_KEY`, `MODELSCOPE_MODEL`|`[Your Key]`, `Qwen/Qwen2.5-Coder-32B-Instruct`| See [ModelScope](https://www.modelscope.cn/docs/model-service/API-Inference/intro)| |**Silicon**|`silicon`|`SILICON_API_KEY`, `SILICON_MODEL`|`[Your Key]`, `Qwen/Qwen2.5-7B-Instruct`|See [SiliconCloud](https://docs.siliconflow.cn/quickstart)| |**Gemini**|`gemini`|`GEMINI_API_KEY`, `GEMINI_MODEL`|`[Your Key]`, `gemini-1.5-flash`|See [Gemini](https://ai.google.dev/gemini-api/docs/openai)| |**Azure**|`azure`|`AZURE_ENDPOINT`, `AZURE_API_KEY`|`https://api.translator.azure.cn`, `[Your Key]`|See [Azure](https://docs.azure.cn/en-us/ai-services/translator/text-translation-overview)| |**Tencent**|`tencent`|`TENCENTCLOUD_SECRET_ID`, `TENCENTCLOUD_SECRET_KEY`|`[Your ID]`, `[Your Key]`|See [Tencent](https://www.tencentcloud.com/products/tmt?from_qcintl=122110104)| |**Dify**|`dify`|`DIFY_API_URL`, `DIFY_API_KEY`|`[Your DIFY URL]`, `[Your Key]`|See [Dify](https://github.com/langgenius/dify),Three variables, lang_out, lang_in, and text, need to be defined in Dify's workflow input.| |**AnythingLLM**|`anythingllm`|`AnythingLLM_URL`, `AnythingLLM_APIKEY`|`[Your AnythingLLM URL]`, `[Your Key]`|See [anything-llm](https://github.com/Mintplex-Labs/anything-llm)| |**Argos Translate**|`argos`| | |See [argos-translate](https://github.com/argosopentech/argos-translate)| |**Grok**|`grok`| `GORK_API_KEY`, `GORK_MODEL` | `[Your GORK_API_KEY]`, `grok-2-1212` |See [Grok](https://docs.x.ai/docs/overview)| |**DeepSeek**|`deepseek`| `DEEPSEEK_API_KEY`, `DEEPSEEK_MODEL` | `[Your DEEPSEEK_API_KEY]`, `deepseek-chat` |See [DeepSeek](https://www.deepseek.com/)| |**OpenAI-Liked**|`openailiked`| `OPENAILIKED_BASE_URL`, `OPENAILIKED_API_KEY`, `OPENAILIKED_MODEL` | `url`, `[Your Key]`, `model name` | None | (need Japenese translation) For large language models that are compatible with the OpenAI API but not listed in the table above, you can set environment variables using the same method outlined for OpenAI in the table. `-s service` または `-s service:model` を使用してサービスを指定します: ```bash pdf2zh example.pdf -s openai:gpt-4o-mini ``` または環境変数でモデルを指定します: ```bash set OPENAI_MODEL=gpt-4o-mini pdf2zh example.pdf -s openai ``` For PowerShell user: ```shell $env:OPENAI_MODEL = gpt-4o-mini pdf2zh example.pdf -s openai ```

例外を指定して翻訳

正規表現を使用して保持する必要がある数式フォントと文字を指定します: ```bash pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])" ``` デフォルトで `Latex`、`Mono`、`Code`、`Italic`、`Symbol` および `Math` フォントを保持します: ```bash pdf2zh example.pdf -f "(CM[^R]|MS.M|XY|MT|BL|RM|EU|LA|RS|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)" ```

スレッド数を指定

`-t` を使用して翻訳に使用するスレッド数を指定します: ```bash pdf2zh example.pdf -t 1 ```

カスタム プロンプト

`--prompt`を使用して、LLMで使用するプロンプトを指定します: ```bash pdf2zh example.pdf -pr prompt.txt ``` `prompt.txt`の例: ```txt [ { "role": "system", "content": "You are a professional,authentic machine translation engine.", }, { "role": "user", "content": "Translate the following markdown source text to ${lang_out}. Keep the formula notation {{v*}} unchanged. Output translation directly without any additional text.\nSource Text: ${text}\nTranslated Text:", }, ] ``` カスタムプロンプトファイルでは、以下の3つの変数が使用できます。 |**変数**|**内容**| |-|-| |`lang_in`|ソース言語| |`lang_out`|ターゲット言語| |`text`|翻訳するテキスト|

API

### Python ```python from pdf2zh import translate, translate_stream params = {"lang_in": "en", "lang_out": "zh", "service": "google", "thread": 4} file_mono, file_dual = translate(files=["example.pdf"], **params)[0] with open("example.pdf", "rb") as f: stream_mono, stream_dual = translate_stream(stream=f.read(), **params) ``` ### HTTP ```bash pip install pdf2zh[backend] pdf2zh --flask pdf2zh --celery worker ``` ```bash curl http://localhost:11008/v1/translate -F "file=@example.pdf" -F "data={\"lang_in\":\"en\",\"lang_out\":\"zh\",\"service\":\"google\",\"thread\":4}" {"id":"d9894125-2f4e-45ea-9d93-1a9068d2045a"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a {"info":{"n":13,"total":506},"state":"PROGRESS"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a {"state":"SUCCESS"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/mono --output example-mono.pdf curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/dual --output example-dual.pdf curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a -X DELETE ```

謝辞

- ドキュメントのマージ:[PyMuPDF](https://github.com/pymupdf/PyMuPDF) - ドキュメントの解析:[Pdfminer.six](https://github.com/pdfminer/pdfminer.six) - ドキュメントの抽出:[MinerU](https://github.com/opendatalab/MinerU) - ドキュメントプレビュー:[Gradio PDF](https://github.com/freddyaboulton/gradio-pdf) - マルチスレッド翻訳:[MathTranslate](https://github.com/SUSYUSTC/MathTranslate) - レイアウト解析:[DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) - ドキュメント標準:[PDF Explained](https://zxyle.github.io/PDF-Explained/)、[PDF Cheat Sheets](https://pdfa.org/resource/pdf-cheat-sheets/) - 多言語フォント:[Go Noto Universal](https://github.com/satbyy/go-noto-universal)

貢献者

![Alt](https://repobeats.axiom.co/api/embed/dfa7583da5332a11468d686fbd29b92320a6a869.svg "Repobeats analytics image")

スター履歴

Star History Chart ## /docs/README_ko-KR.md # Create new file
[English](../README.md) | [简体中文](README_zh-CN.md) | [繁體中文](README_zh-TW.md) | [日本語](README_ja-JP.md) | 한국어 PDF2ZH

PDFMathTranslate

Byaidu%2FPDFMathTranslate | Trendshift
과학 PDF 문서 번역 및 이중 언어 비교 도구 - 📊 수식, 차트, 목차, 주석 유지 _([미리보기](#preview))_ - 🌐 [다양한 언어](#language)와 [다양한 번역 서비스](#services) 지원 - 🤖 [커맨드라인 도구](#usage), [대화형 사용자 인터페이스](#gui), 및 [Docker](#docker) 제공 피드백은 [GitHub Issues](https://github.com/Byaidu/PDFMathTranslate/issues) 또는 [Telegram 그룹](https://t.me/+Z9_SgnxmsmA5NzBl)에서 해주세요.

최근 업데이트

- [2024년 12월 24일] [Xinference](https://github.com/xorbitsai/inference) 실행 로컬 LLM 지원 추가 _(by [@imClumsyPanda](https://github.com/imClumsyPanda))_ - [2024년 11월 26일] CLI가 온라인 파일을 지원하게 되었습니다 _(by [@reycn](https://github.com/reycn))_ - [2024년 11월 24일] 의존성 크기를 줄이기 위해 [ONNX](https://github.com/onnx/onnx) 지원 추가 _(by [@Wybxc](https://github.com/Wybxc))_ - [2024년 11월 23일] 🌟 [무료 공공 서비스](#demo) 온라인! _(by [@Byaidu](https://github.com/Byaidu))_ - [2024년 11월 23일] 웹 봇을 방지하기 위한 방화벽 추가 _(by [@Byaidu](https://github.com/Byaidu))_ - [2024년 11월 22일] GUI가 이탈리아어를 지원하고 개선되었습니다 _(by [@Byaidu](https://github.com/Byaidu), [@reycn](https://github.com/reycn))_ - [2024년 11월 22일] 배포된 서비스를 다른 사람과 공유할 수 있게 되었습니다 _(by [@Zxis233](https://github.com/Zxis233))_ - [2024년 11월 22일] Tencent 번역 지원 _(by [@hellofinch](https://github.com/hellofinch))_ - [2024년 11월 21일] GUI가 이중 언어 문서 다운로드를 지원하게 되었습니다 _(by [@reycn](https://github.com/reycn))_ - [2024년 11월 20일] 🌟 [데모](#demo)가 온라인이 되었습니다! _(by [@reycn](https://github.com/reycn))_

미리보기

공공 서비스 🌟

### 무료 서비스 () 설치 없이 [무료 공공 서비스](https://pdf2zh.com/)를 온라인으로 사용해 볼 수 있습니다. ### 데모 설치 없이 [HuggingFace의 데모](https://huggingface.co/spaces/reycn/PDFMathTranslate-Docker)와 [ModelScope의 데모](https://www.modelscope.cn/studios/AI-ModelScope/PDFMathTranslate)를 사용해 볼 수 있습니다. 데모의 컴퓨팅 리소스가 제한되어 있으므로 남용하지 말아주세요.

설치 및 사용법

이 프로젝트를 사용하는 4가지 방법을 제공합니다: [커맨드라인 도구](#cmd), [포터블](#portable), [GUI](#gui), 및 [Docker](#docker). pdf2zh 실행에는 추가 모델(`wybxc/DocLayout-YOLO-DocStructBench-onnx`)이 필요합니다. 이 모델은 ModelScope에서도 찾을 수 있습니다. 시작할 때 이 모델 다운로드에 문제가 있다면 다음 환경 변수를 사용하세요: ```shell set HF_ENDPOINT=https://hf-mirror.com ``` PowerShell 사용자의 경우: ```shell $env:HF_ENDPOINT = https://hf-mirror.com ```

방법 1. 커맨드라인 도구

1. Python이 설치되어 있어야 합니다 (버전 3.10 <= 버전 <= 3.12) 2. 패키지를 설치합니다: ```bash pip install pdf2zh ``` 3. 번역을 실행하고 [현재 작업 디렉토리](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444)에 파일을 생성합니다: ```bash pdf2zh document.pdf ```

방법 2. 포터블

Python 환경을 미리 설치할 필요가 없습니다. [setup.bat](https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/script/setup.bat)을 다운로드하고 더블클릭하여 실행합니다.

방법 3. GUI

1. Python이 설치되어 있어야 합니다 (버전 3.10 <= 버전 <= 3.12) 2. 패키지를 설치합니다: ```bash pip install pdf2zh ``` 3. 브라우저에서 사용을 시작합니다: ```bash pdf2zh -i ``` 4. 브라우저가 자동으로 시작되지 않으면 다음 URL을 엽니다: ```bash http://localhost:7860/ ``` 자세한 내용은 [GUI 문서](./README_GUI.md)를 참조하세요.

방법 4. Docker

1. 풀하고 실행합니다: ```bash docker pull byaidu/pdf2zh docker run -d -p 7860:7860 byaidu/pdf2zh ``` 2. 브라우저에서 엽니다: ``` http://localhost:7860/ ``` 클라우드 서비스에서 Docker 배포용:

고급 옵션

커맨드라인에서 번역 명령을 실행하여 현재 작업 디렉토리에 번역된 문서 `example-mono.pdf`와 이중 언어 문서 `example-dual.pdf`를 생성합니다. 기본적으로 Google 번역 서비스를 사용합니다. 더 많은 지원 번역 서비스는 [여기](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#services)에서 찾을 수 있습니다. cmd 다음 표에 참고용으로 모든 고급 옵션을 나열했습니다: | 옵션 | 기능 | 예시 | | -------------- | ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | | files | 로컬 파일 | `pdf2zh ~/local.pdf` | | links | 온라인 파일 | `pdf2zh http://arxiv.org/paper.pdf` | | `-i` | [GUI 진입](#gui) | `pdf2zh -i` | | `-p` | [부분 문서 번역](#partial) | `pdf2zh example.pdf -p 1` | | `-li` | [소스 언어](#languages) | `pdf2zh example.pdf -li en` | | `-lo` | [대상 언어](#languages) | `pdf2zh example.pdf -lo zh` | | `-s` | [번역 서비스](#services) | `pdf2zh example.pdf -s deepl` | | `-t` | [멀티스레드](#threads) | `pdf2zh example.pdf -t 1` | | `-o` | 출력 디렉토리 | `pdf2zh example.pdf -o output` | | `-f`, `-c` | [예외](#exceptions) | `pdf2zh example.pdf -f "(MS.*)"` | | `--share` | [gradio 공개 링크 얻기] | `pdf2zh -i --share` | | `--authorized` | [[웹 인증 및 사용자 정의 인증 페이지 추가](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.)] | `pdf2zh -i --authorized users.txt [auth.html]` | | `--prompt` | [사용자 정의 대형 모델 프롬프트 사용] | `pdf2zh --prompt [prompt.txt]` | | `--onnx` | [사용자 정의 DocLayout-YOLO ONNX 모델 사용] | `pdf2zh --onnx [onnx/model/path]` | | `--serverport` | [사용자 정의 WebUI 포트 사용] | `pdf2zh --serverport 7860` | | `--dir` | [배치 번역] | `pdf2zh --dir /path/to/translate/` | | `--config` | [구성 파일](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#cofig) | `pdf2zh --config /path/to/config/config.json` |

전체 또는 부분 문서 번역

- **전체 번역** ```bash pdf2zh example.pdf ``` - **부분 번역** ```bash pdf2zh example.pdf -p 1-3,5 ```

소스 언어와 대상 언어 지정

[Google Languages Codes](https://developers.google.com/admin-sdk/directory/v1/languages), [DeepL Languages Codes](https://developers.deepl.com/docs/resources/supported-languages) 참조 ```bash pdf2zh example.pdf -li en -lo ko ```

다른 서비스로 번역

다음 표는 각 번역 서비스에 필요한 [환경 변수](https://chatgpt.com/share/6734a83d-9d48-800e-8a46-f57ca6e8bcb4)를 보여줍니다. 각 서비스를 사용하기 전에 이러한 변수를 설정하세요. | **번역기** | **서비스** | **환경 변수** | **기본값** | **참고** | | ------------------- | -------------- | --------------------------------------------------------------------- | -------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Google (기본)** | `google` | 없음 | N/A | 없음 | | **Bing** | `bing` | 없음 | N/A | 없음 | | **DeepL** | `deepl` | `DEEPL_AUTH_KEY` | `[Your Key]` | [DeepL](https://support.deepl.com/hc/en-us/articles/360020695820-API-Key-for-DeepL-s-API) 참조 | | **DeepLX** | `deeplx` | `DEEPLX_ENDPOINT` | `https://api.deepl.com/translate` | [DeepLX](https://github.com/OwO-Network/DeepLX) 참조 | | **Ollama** | `ollama` | `OLLAMA_HOST`, `OLLAMA_MODEL` | `http://127.0.0.1:11434`, `gemma2` | [Ollama](https://github.com/ollama/ollama) 참조 | | **OpenAI** | `openai` | `OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL` | `https://api.openai.com/v1`, `[Your Key]`, `gpt-4o-mini` | [OpenAI](https://platform.openai.com/docs/overview) 참조 | | **AzureOpenAI** | `azure-openai` | `AZURE_OPENAI_BASE_URL`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_MODEL` | `[Your Endpoint]`, `[Your Key]`, `gpt-4o-mini` | [Azure OpenAI](https://learn.microsoft.com/zh-cn/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Cjavascript-keyless%2Ctypescript-keyless%2Cpython&pivots=programming-language-python) 참조 | | **Zhipu** | `zhipu` | `ZHIPU_API_KEY`, `ZHIPU_MODEL` | `[Your Key]`, `glm-4-flash` | [Zhipu](https://open.bigmodel.cn/dev/api/thirdparty-frame/openai-sdk) 참조 | | **ModelScope** | `modelscope` | `MODELSCOPE_API_KEY`, `MODELSCOPE_MODEL` | `[Your Key]`, `Qwen/Qwen2.5-Coder-32B-Instruct` | [ModelScope](https://www.modelscope.cn/docs/model-service/API-Inference/intro) 참조 | | **Silicon** | `silicon` | `SILICON_API_KEY`, `SILICON_MODEL` | `[Your Key]`, `Qwen/Qwen2.5-7B-Instruct` | [SiliconCloud](https://docs.siliconflow.cn/quickstart) 참조 | | **Gemini** | `gemini` | `GEMINI_API_KEY`, `GEMINI_MODEL` | `[Your Key]`, `gemini-1.5-flash` | [Gemini](https://ai.google.dev/gemini-api/docs/openai) 참조 | | **Azure** | `azure` | `AZURE_ENDPOINT`, `AZURE_API_KEY` | `https://api.translator.azure.cn`, `[Your Key]` | [Azure](https://docs.azure.cn/en-us/ai-services/translator/text-translation-overview) 참조 | | **Tencent** | `tencent` | `TENCENTCLOUD_SECRET_ID`, `TENCENTCLOUD_SECRET_KEY` | `[Your ID]`, `[Your Key]` | [Tencent](https://www.tencentcloud.com/products/tmt?from_qcintl=122110104) 참조 | | **Dify** | `dify` | `DIFY_API_URL`, `DIFY_API_KEY` | `[Your DIFY URL]`, `[Your Key]` | [Dify](https://github.com/langgenius/dify) 참조, Dify의 워크플로우 입력에서 lang_out, lang_in, text 세 변수를 정의해야 합니다. | | **AnythingLLM** | `anythingllm` | `AnythingLLM_URL`, `AnythingLLM_APIKEY` | `[Your AnythingLLM URL]`, `[Your Key]` | [anything-llm](https://github.com/Mintplex-Labs/anything-llm) 참조 | | **Argos Translate** | `argos` | | | [argos-translate](https://github.com/argosopentech/argos-translate) 참조 | | **Grok** | `grok` | `GORK_API_KEY`, `GORK_MODEL` | `[Your GORK_API_KEY]`, `grok-2-1212` | [Grok](https://docs.x.ai/docs/overview) 참조 | | **DeepSeek** | `deepseek` | `DEEPSEEK_API_KEY`, `DEEPSEEK_MODEL` | `[Your DEEPSEEK_API_KEY]`, `deepseek-chat` | [DeepSeek](https://www.deepseek.com/) 참조 | | **OpenAI-Liked** | `openailiked` | `OPENAILIKED_BASE_URL`, `OPENAILIKED_API_KEY`, `OPENAILIKED_MODEL` | `url`, `[Your Key]`, `model name` | 없음 | 위 표에 없는 OpenAI API와 호환되는 대형 언어 모델의 경우, 표의 OpenAI와 동일한 방식으로 환경 변수를 설정할 수 있습니다. `-s service` 또는 `-s service:model`을 사용하여 번역 서비스를 지정합니다: ```bash pdf2zh example.pdf -s openai:gpt-4o-mini ``` 또는 환경 변수로 모델을 지정합니다: ```bash set OPENAI_MODEL=gpt-4o-mini pdf2zh example.pdf -s openai ``` PowerShell 사용자의 경우: ```shell $env:OPENAI_MODEL = gpt-4o-mini pdf2zh example.pdf -s openai ```

예외 지정

정규식을 사용하여 보존해야 할 수식 폰트와 문자를 지정합니다: ```bash pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])" ``` 기본적으로 `Latex`, `Mono`, `Code`, `Italic`, `Symbol` 및 `Math` 폰트를 보존합니다: ```bash pdf2zh example.pdf -f "(CM[^R]|MS.M|XY|MT|BL|RM|EU|LA|RS|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)" ```

스레드 수 지정

`-t`를 사용하여 번역에 사용할 스레드 수를 지정합니다: ```bash pdf2zh example.pdf -t 1 ```

사용자 정의 프롬프트

`--prompt`를 사용하여 LLM에서 사용할 프롬프트를 지정합니다: ```bash pdf2zh example.pdf -pr prompt.txt ``` `prompt.txt` 예시: ```txt [ { "role": "system", "content": "You are a professional,authentic machine translation engine.", }, { "role": "user", "content": "Translate the following markdown source text to ${lang_out}. Keep the formula notation {{v*}} unchanged. Output translation directly without any additional text.\nSource Text: ${text}\nTranslated Text:", }, ] ``` 사용자 정의 프롬프트 파일에서는 다음 세 가지 변수를 사용할 수 있습니다: | **변수** | **내용** | | ---------- | ------------- | | `lang_in` | 소스 언어 | | `lang_out` | 대상 언어 | | `text` | 번역할 텍스트 |

API

### Python ```python from pdf2zh import translate, translate_stream params = {"lang_in": "en", "lang_out": "ko", "service": "google", "thread": 4} file_mono, file_dual = translate(files=["example.pdf"], **params)[0] with open("example.pdf", "rb") as f: stream_mono, stream_dual = translate_stream(stream=f.read(), **params) ``` ### HTTP ```bash pip install pdf2zh[backend] pdf2zh --flask pdf2zh --celery worker ``` ```bash curl http://localhost:11008/v1/translate -F "file=@example.pdf" -F "data={\"lang_in\":\"en\",\"lang_out\":\"ko\",\"service\":\"google\",\"thread\":4}" {"id":"d9894125-2f4e-45ea-9d93-1a9068d2045a"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a {"info":{"n":13,"total":506},"state":"PROGRESS"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a {"state":"SUCCESS"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/mono --output example-mono.pdf curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/dual --output example-dual.pdf curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a -X DELETE ```

감사의 말

- 문서 병합: [PyMuPDF](https://github.com/pymupdf/PyMuPDF) - 문서 파싱: [Pdfminer.six](https://github.com/pdfminer/pdfminer.six) - 문서 추출: [MinerU](https://github.com/opendatalab/MinerU) - 문서 미리보기: [Gradio PDF](https://github.com/freddyaboulton/gradio-pdf) - 멀티스레드 번역: [MathTranslate](https://github.com/SUSYUSTC/MathTranslate) - 레이아웃 파싱: [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) - 문서 표준: [PDF Explained](https://zxyle.github.io/PDF-Explained/), [PDF Cheat Sheets](https://pdfa.org/resource/pdf-cheat-sheets/) - 다국어 폰트: [Go Noto Universal](https://github.com/satbyy/go-noto-universal)

기여자

![Alt](https://repobeats.axiom.co/api/embed/dfa7583da5332a11468d686fbd29b92320a6a869.svg "Repobeats analytics image")

스타 히스토리

Star History Chart ## /docs/README_zh-CN.md
[English](../README.md) | 简体中文 | [繁體中文](README_zh-TW.md) | [日本語](README_ja-JP.md) PDF2ZH

PDFMathTranslate

Byaidu%2FPDFMathTranslate | Trendshift
科学 PDF 文档翻译及双语对照工具 - 📊 保留公式、图表、目录和注释 *([预览效果](#preview))* - 🌐 支持 [多种语言](./ADVANCED.md#language) 和 [诸多翻译服务](./ADVANCED.md#services) - 🤖 提供 [命令行工具](#usage),[图形交互界面](#gui),以及 [容器化部署](#docker) 欢迎在 [GitHub Issues](https://github.com/Byaidu/PDFMathTranslate/issues) 或 [Telegram 用户群](https://t.me/+Z9_SgnxmsmA5NzBl) 有关如何贡献的详细信息,请查阅 [贡献指南](https://github.com/Byaidu/PDFMathTranslate/wiki/Contribution-Guide---%E8%B4%A1%E7%8C%AE%E6%8C%87%E5%8D%97)

更新

- [2025 年 2 月 22 日] 更好的发布 CI 和精心打包的 windows-amd64 exe (由 [@awwaawwa](https://github.com/awwaawwa) 提供) - [2024 年 12 月 24 日] 翻译器现在支持在 [Xinference](https://github.com/xorbitsai/inference) 上使用本地模型 _(由 [@imClumsyPanda](https://github.com/imClumsyPanda) 提供)_ - [2024 年 12 月 19 日] 现在支持非 PDF/A 文档,使用 `-cp` _(由 [@reycn](https://github.com/reycn) 提供)_ - [2024 年 12 月 13 日] 额外支持后端 _(由 [@YadominJinta](https://github.com/YadominJinta) 提供)_ - [2024 年 12 月 10 日] 翻译器现在支持 Azure 上的 OpenAI 模型 _(由 [@yidasanqian](https://github.com/yidasanqian) 提供)_

预览

在线演示 🌟

在线服务 🌟

您可以通过以下演示尝试我们的应用程序: - [公共免费服务](https://pdf2zh.com/) 在线使用,无需安装 _(推荐)_。 - [沉浸式翻译 - BabelDOC](https://app.immersivetranslate.com/babel-doc/) 每月免费 1000 页 _(推荐)_ - [在 HuggingFace 上托管的演示](https://huggingface.co/spaces/reycn/PDFMathTranslate-Docker) - [在 ModelScope 上托管的演示](https://www.modelscope.cn/studios/AI-ModelScope/PDFMathTranslate) 无需安装。 请注意演示的计算资源有限,请避免滥用它们。

安装和使用

### 方法 针对不同的使用案例,我们提供不同的方法来使用我们的程序:
1. UV 安装 1. 安装 Python (3.10 <= 版本 <= 3.12) 2. 安装我们的包: ```bash pip install uv uv tool install --python 3.12 pdf2zh ``` 3. 执行翻译,文件生成在 [当前工作目录](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444): ```bash pdf2zh document.pdf ```
2. Windows exe 1. 从 [发布页面](https://github.com/Byaidu/PDFMathTranslate/releases) 下载 pdf2zh-version-win64.zip 2. 解压缩并双击 `pdf2zh.exe` 运行。
3. 图形用户界面 1. 安装 Python (3.10 <= 版本 <= 3.12) 2. 安装我们的包: ```bash pip install pdf2zh ``` 3. 在浏览器中开始使用: ```bash pdf2zh -i ``` 4. 如果您的浏览器没有自动启动,请访问 ```bash http://localhost:7860/ ``` 有关更多详细信息,请参阅 [GUI 文档](./README_GUI.md)。
4. Docker 1. 拉取并运行: ```bash docker pull byaidu/pdf2zh docker run -d -p 7860:7860 byaidu/pdf2zh ``` 2. 在浏览器中打开: ``` http://localhost:7860/ ``` 对于云服务上的 docker 部署:
5. Zotero 插件 有关更多细节,请参见 [Zotero PDF2zh](https://github.com/guaguastandup/zotero-pdf2zh)。
6. 命令行 1. 已安装 Python(3.10 <= 版本 <= 3.12) 2. 安装我们的包: ```bash pip install pdf2zh ``` 3. 执行翻译,文件生成在 [当前工作目录](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444): ```bash pdf2zh document.pdf ```
> [!TIP] > > - 如果你使用 Windows 并在下载后无法打开文件,请安装 [vc_redist.x64.exe](https://aka.ms/vs/17/release/vc_redist.x64.exe) 并重试。 > > - 如果你无法访问 Docker Hub,请尝试在 [GitHub 容器注册中心](https://github.com/Byaidu/PDFMathTranslate/pkgs/container/pdfmathtranslate) 上使用该镜像。 > ```bash > docker pull ghcr.io/byaidu/pdfmathtranslate > docker run -d -p 7860:7860 ghcr.io/byaidu/pdfmathtranslate > ``` ### 无法安装? 当前程序在工作前需要一个 AI 模型 (`wybxc/DocLayout-YOLO-DocStructBench-onnx`),一些用户由于网络问题无法下载。如果你在下载此模型时遇到问题,我们提供以下环境变量的解决方法: ```shell set HF_ENDPOINT=https://hf-mirror.com ``` 对于 PowerShell 用户: ```shell $env:HF_ENDPOINT = https://hf-mirror.com ``` 如果此解决方案对您无效或您遇到其他问题,请参阅 [常见问题解答](https://github.com/Byaidu/PDFMathTranslate/wiki#-faq--%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98)。

高级选项

在命令行中执行翻译命令,在当前工作目录下生成译文文档 `example-mono.pdf` 和双语对照文档 `example-dual.pdf`,默认使用 Google 翻译服务,更多支持的服务在[这里](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#services))。 cmd 在下表中,我们列出了所有高级选项供参考: | 选项 | 功能 | 示例 | | ------------ | ------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- | | files | 本地文件 | `pdf2zh ~/local.pdf` | | links | 在线文件 | `pdf2zh http://arxiv.org/paper.pdf` | | `-i` | [进入 GUI](#gui) | `pdf2zh -i` | | `-p` | [部分文档翻译](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#partial) | `pdf2zh example.pdf -p 1` | | `-li` | [源语言](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#languages) | `pdf2zh example.pdf -li en` | | `-lo` | [目标语言](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#languages) | `pdf2zh example.pdf -lo zh` | | `-s` | [翻译服务](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#services) | `pdf2zh example.pdf -s deepl` | | `-t` | [多线程](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#threads) | `pdf2zh example.pdf -t 1` | | `-o` | 输出目录 | `pdf2zh example.pdf -o output` | | `-f`, `-c` | [异常](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#exceptions) | `pdf2zh example.pdf -f "(MS.*)"` | | `-cp` | 兼容模式 | `pdf2zh example.pdf --compatible` | | `--share` | 公开链接 | `pdf2zh -i --share` | | `--authorized` | [授权](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#auth) | `pdf2zh -i --authorized users.txt [auth.html]` | | `--prompt` | [自定义提示](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#prompt) | `pdf2zh --prompt [prompt.txt]` | | `--onnx` | [使用自定义 DocLayout-YOLO ONNX 模型] | `pdf2zh --onnx [onnx/model/path]` | | `--serverport` | [使用自定义 WebUI 端口] | `pdf2zh --serverport 7860` | | `--dir` | [批量翻译] | `pdf2zh --dir /path/to/translate/` | | `--config` | [配置文件](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.md#cofig) | `pdf2zh --config /path/to/config/config.json` | | `--serverport` | [自定义 gradio 服务器端口] | `pdf2zh --serverport 7860` | | `--babeldoc`| 使用实验性后端 [BabelDOC](https://funstory-ai.github.io/BabelDOC/) 翻译 |`pdf2zh --babeldoc` -s openai example.pdf| 有关详细说明,请参阅我们的文档 [高级用法](./ADVANCED.md),以获取每个选项的完整列表。

二次开发 (API)

当前的 pdf2zh API 暂时已弃用。API 将在 [pdf2zh 2.0](https://github.com/Byaidu/PDFMathTranslate/issues/586)发布后重新提供。对于需要程序化访问的用户,请使用[BabelDOC](https://github.com/funstory-ai/BabelDOC)的 `babeldoc.high_level.async_translate` 函数。 API 暂时弃用意味着:相关代码暂时不会被移除,但不会提供技术支持,也不会修复 bug。

待办事项

- [ ] 使用基于 DocLayNet 的模型解析布局,[PaddleX](https://github.com/PaddlePaddle/PaddleX/blob/17cc27ac3842e7880ca4aad92358d3ef8555429a/paddlex/repo_apis/PaddleDetection_api/object_det/official_categories.py#L81),[PaperMage](https://github.com/allenai/papermage/blob/9cd4bb48cbedab45d0f7a455711438f1632abebe/README.md?plain=1#L102),[SAM2](https://github.com/facebookresearch/sam2) - [ ] 修复页面旋转、目录、列表格式 - [ ] 修复旧论文中的像素公式 - [ ] 异步重试,除了 KeyboardInterrupt - [ ] 针对西方语言的 Knuth–Plass 算法 - [ ] 支持非 PDF/A 文件 - [ ] [Zotero](https://github.com/zotero/zotero) 和 [Obsidian](https://github.com/obsidianmd/obsidian-releases) 的插件

致谢

- [Immersive Translation](https://immersivetranslate.com) 为此项目的活跃贡献者提供每月的专业会员兑换码,详细信息请查看:[CONTRIBUTOR_REWARD.md](https://github.com/funstory-ai/BabelDOC/blob/main/docs/CONTRIBUTOR_REWARD.md) - 文档合并:[PyMuPDF](https://github.com/pymupdf/PyMuPDF) - 文档解析:[Pdfminer.six](https://github.com/pdfminer/pdfminer.six) - 文档提取:[MinerU](https://github.com/opendatalab/MinerU) - 文档预览:[Gradio PDF](https://github.com/freddyaboulton/gradio-pdf) - 多线程翻译:[MathTranslate](https://github.com/SUSYUSTC/MathTranslate) - 布局解析:[DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) - 文档标准:[PDF Explained](https://zxyle.github.io/PDF-Explained/),[PDF Cheat Sheets](https://pdfa.org/resource/pdf-cheat-sheets/) - 多语言字体:[Go Noto Universal](https://github.com/satbyy/go-noto-universal)

贡献者

![Alt](https://repobeats.axiom.co/api/embed/dfa7583da5332a11468d686fbd29b92320a6a869.svg "Repobeats analytics image")

星标历史

星标历史图表 ## /docs/README_zh-TW.md
[English](../README.md) | [简体中文](README_zh-CN.md) | 繁體中文 | [日本語](README_ja-JP.md) PDF2ZH

PDFMathTranslate

Byaidu%2FPDFMathTranslate | Trendshift
科學 PDF 文件翻譯及雙語對照工具 - 📊 保留公式、圖表、目錄和註釋 *([預覽效果](#preview))* - 🌐 支援 [多種語言](#language) 和 [諸多翻譯服務](#services) - 🤖 提供 [命令列工具](#usage)、[圖形使用者介面](#gui),以及 [容器化部署](#docker) 歡迎在 [GitHub Issues](https://github.com/Byaidu/PDFMathTranslate/issues) 或 [Telegram 使用者群](https://t.me/+Z9_SgnxmsmA5NzBl)(https://qm.qq.com/q/DixZCxQej0) 中提出回饋 如需瞭解如何貢獻的詳細資訊,請查閱 [貢獻指南](https://github.com/Byaidu/PDFMathTranslate/wiki/Contribution-Guide---%E8%B4%A1%E7%8C%AE%E6%8C%87%E5%8D%97)

近期更新

- [Dec. 24 2024] 翻譯功能支援接入由 [Xinference](https://github.com/xorbitsai/inference) 執行的本機 LLM _(by [@imClumsyPanda](https://github.com/imClumsyPanda))_ - [Nov. 26 2024] CLI 現在已支援(多個)線上 PDF 檔 *(by [@reycn](https://github.com/reycn))* - [Nov. 24 2024] 為了降低依賴大小,提供 [ONNX](https://github.com/onnx/onnx) 支援 *(by [@Wybxc](https://github.com/Wybxc))* - [Nov. 23 2024] 🌟 [免費公共服務](#demo) 上線! *(by [@Byaidu](https://github.com/Byaidu))* - [Nov. 23 2024] 新增防止網頁爬蟲的防火牆 *(by [@Byaidu](https://github.com/Byaidu))* - [Nov. 22 2024] 圖形使用者介面現已支援義大利語並進行了一些更新 *(by [@Byaidu](https://github.com/Byaidu), [@reycn](https://github.com/reycn))* - [Nov. 22 2024] 現在你可以將自己部署的服務分享給朋友 *(by [@Zxis233](https://github.com/Zxis233))* - [Nov. 22 2024] 支援騰訊翻譯 *(by [@hellofinch](https://github.com/hellofinch))* - [Nov. 21 2024] 圖形使用者介面現在支援下載雙語文件 *(by [@reycn](https://github.com/reycn))* - [Nov. 20 2024] 🌟 提供了 [線上示範](#demo)! *(by [@reycn](https://github.com/reycn))*

效果預覽

線上示範 🌟

### 免費服務 () 你可以立即嘗試 [免費公共服務](https://pdf2zh.com/) 而無需安裝 ### 線上示範 你可以直接在 [HuggingFace 上的線上示範](https://huggingface.co/spaces/reycn/PDFMathTranslate-Docker)和[魔搭的線上示範](https://www.modelscope.cn/studios/AI-ModelScope/PDFMathTranslate)進行嘗試,無需安裝。 請注意,示範使用的運算資源有限,請勿濫用。

安裝與使用

我們提供了四種使用此專案的方法:[命令列工具](#cmd)、[便攜式安裝](#portable)、[圖形使用者介面](#gui) 與 [容器化部署](#docker)。 pdf2zh 在執行時需要額外下載模型(`wybxc/DocLayout-YOLO-DocStructBench-onnx`),該模型也可在魔搭(ModelScope)上取得。如果在啟動時下載該模型時遇到問題,請使用如下環境變數: ```shell set HF_ENDPOINT=https://hf-mirror.com ```

方法一、命令列工具

1. 確保已安裝 Python 版本大於 3.10 且小於 3.12 2. 安裝此程式: ```bash pip install pdf2zh ``` 3. 執行翻譯,生成檔案位於 [目前工作目錄](https://chatgpt.com/share/6745ed36-9acc-800e-8a90-59204bd13444): ```bash pdf2zh document.pdf ```

方法二、便攜式安裝

無需預先安裝 Python 環境 下載 [setup.bat](https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/script/setup.bat) 並直接雙擊執行

方法三、圖形使用者介面

1. 確保已安裝 Python 版本大於 3.10 且小於 3.12 2. 安裝此程式: ```bash pip install pdf2zh ``` 3. 在瀏覽器中啟動使用: ```bash pdf2zh -i ``` 4. 如果您的瀏覽器沒有自動開啟並跳轉,請手動在瀏覽器開啟: ```bash http://localhost:7860/ ``` 查看 [documentation for GUI](/README_GUI.md) 以獲取詳細說明

方法四、容器化部署

1. 拉取 Docker 映像檔並執行: ```bash docker pull byaidu/pdf2zh docker run -d -p 7860:7860 byaidu/pdf2zh ``` 2. 透過瀏覽器開啟: ``` http://localhost:7860/ ``` 用於在雲服務上部署容器映像檔:

高級選項

在命令列中執行翻譯指令,並在目前工作目錄下生成譯文檔案 `example-mono.pdf` 和雙語對照檔案 `example-dual.pdf`。預設使用 Google 翻譯服務。 cmd 以下表格列出了所有高級選項,供參考: | Option | 功能 | 範例 | | -------- | ------- |------- | | files | 本機檔案 | `pdf2zh ~/local.pdf` | | links | 線上檔案 | `pdf2zh http://arxiv.org/paper.pdf` | | `-i` | [進入圖形介面](#gui) | `pdf2zh -i` | | `-p` | [僅翻譯部分文件](#partial) | `pdf2zh example.pdf -p 1` | | `-li` | [原文語言](#language) | `pdf2zh example.pdf -li en` | | `-lo` | [目標語言](#language) | `pdf2zh example.pdf -lo zh` | | `-s` | [指定翻譯服務](#services) | `pdf2zh example.pdf -s deepl` | | `-t` | [多執行緒](#threads) | `pdf2zh example.pdf -t 1` | | `-o` | 輸出目錄 | `pdf2zh example.pdf -o output` | | `-f`, `-c` | [例外規則](#exceptions) | `pdf2zh example.pdf -f "(MS.*)"` | | `--share` | [獲取 gradio 公開連結] | `pdf2zh -i --share` | | `--authorized` | [[添加網頁認證及自訂認證頁面](https://github.com/Byaidu/PDFMathTranslate/blob/main/docs/ADVANCED.)] | `pdf2zh -i --authorized users.txt [auth.html]` | | `--prompt` | [使用自訂的大模型 Prompt] | `pdf2zh --prompt [prompt.txt]` | | `--onnx` | [使用自訂的 DocLayout-YOLO ONNX 模型] | `pdf2zh --onnx [onnx/model/path]` | | `--serverport` | [自訂 WebUI 埠號] | `pdf2zh --serverport 7860` | | `--dir` | [資料夾翻譯] | `pdf2zh --dir /path/to/translate/` |

全文或部分文件翻譯

- **全文翻譯** ```bash pdf2zh example.pdf ``` - **部分翻譯** ```bash pdf2zh example.pdf -p 1-3,5 ```

指定原文語言與目標語言

可參考 [Google 語言代碼](https://developers.google.com/admin-sdk/directory/v1/languages)、[DeepL 語言代碼](https://developers.deepl.com/docs/resources/supported-languages) ```bash pdf2zh example.pdf -li en -lo ja ```

使用不同的翻譯服務

下表列出了每個翻譯服務所需的 [環境變數](https://chatgpt.com/share/6734a83d-9d48-800e-8a46-f57ca6e8bcb4)。在使用前,請先確保已設定好對應的變數。 |**Translator**|**Service**|**Environment Variables**|**Default Values**|**Notes**| |-|-|-|-|-| |**Google (Default)**|`google`|無|N/A|無| |**Bing**|`bing`|無|N/A|無| |**DeepL**|`deepl`|`DEEPL_AUTH_KEY`|`[Your Key]`|參閱 [DeepL](https://support.deepl.com/hc/en-us/articles/360020695820-API-Key-for-DeepL-s-API)| |**DeepLX**|`deeplx`|`DEEPLX_ENDPOINT`|`https://api.deepl.com/translate`|參閱 [DeepLX](https://github.com/OwO-Network/DeepLX)| |**Ollama**|`ollama`|`OLLAMA_HOST`, `OLLAMA_MODEL`|`http://127.0.0.1:11434`, `gemma2`|參閱 [Ollama](https://github.com/ollama/ollama)| |**OpenAI**|`openai`|`OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL`|`https://api.openai.com/v1`, `[Your Key]`, `gpt-4o-mini`|參閱 [OpenAI](https://platform.openai.com/docs/overview)| |**AzureOpenAI**|`azure-openai`|`AZURE_OPENAI_BASE_URL`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_MODEL`|`[Your Endpoint]`, `[Your Key]`, `gpt-4o-mini`|參閱 [Azure OpenAI](https://learn.microsoft.com/zh-cn/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line%2Cjavascript-keyless%2Ctypescript-keyless%2Cpython&pivots=programming-language-python)| |**Zhipu**|`zhipu`|`ZHIPU_API_KEY`, `ZHIPU_MODEL`|`[Your Key]`, `glm-4-flash`|參閱 [Zhipu](https://open.bigmodel.cn/dev/api/thirdparty-frame/openai-sdk)| | **ModelScope** | `modelscope` |`MODELSCOPE_API_KEY`, `MODELSCOPE_MODEL`|`[Your Key]`, `Qwen/Qwen2.5-Coder-32B-Instruct`| 參閱 [ModelScope](https://www.modelscope.cn/docs/model-service/API-Inference/intro)| |**Silicon**|`silicon`|`SILICON_API_KEY`, `SILICON_MODEL`|`[Your Key]`, `Qwen/Qwen2.5-7B-Instruct`|參閱 [SiliconCloud](https://docs.siliconflow.cn/quickstart)| |**Gemini**|`gemini`|`GEMINI_API_KEY`, `GEMINI_MODEL`|`[Your Key]`, `gemini-1.5-flash`|參閱 [Gemini](https://ai.google.dev/gemini-api/docs/openai)| |**Azure**|`azure`|`AZURE_ENDPOINT`, `AZURE_API_KEY`|`https://api.translator.azure.cn`, `[Your Key]`|參閱 [Azure](https://docs.azure.cn/en-us/ai-services/translator/text-translation-overview)| |**Tencent**|`tencent`|`TENCENTCLOUD_SECRET_ID`, `TENCENTCLOUD_SECRET_KEY`|`[Your ID]`, `[Your Key]`|參閱 [Tencent](https://www.tencentcloud.com/products/tmt?from_qcintl=122110104)| |**Dify**|`dify`|`DIFY_API_URL`, `DIFY_API_KEY`|`[Your DIFY URL]`, `[Your Key]`|參閱 [Dify](https://github.com/langgenius/dify),需要在 Dify 的工作流程輸入中定義三個變數:lang_out、lang_in、text。| |**AnythingLLM**|`anythingllm`|`AnythingLLM_URL`, `AnythingLLM_APIKEY`|`[Your AnythingLLM URL]`, `[Your Key]`|參閱 [anything-llm](https://github.com/Mintplex-Labs/anything-llm)| |**Argos Translate**|`argos`| | |參閱 [argos-translate](https://github.com/argosopentech/argos-translate)| |**Grok**|`grok`| `GORK_API_KEY`, `GORK_MODEL` | `[Your GORK_API_KEY]`, `grok-2-1212` |參閱 [Grok](https://docs.x.ai/docs/overview)| |**DeepSeek**|`deepseek`| `DEEPSEEK_API_KEY`, `DEEPSEEK_MODEL` | `[Your DEEPSEEK_API_KEY]`, `deepseek-chat` |參閱 [DeepSeek](https://www.deepseek.com/)| |**OpenAI-Liked**|`openailiked`| `OPENAILIKED_BASE_URL`, `OPENAILIKED_API_KEY`, `OPENAILIKED_MODEL` | `url`, `[Your Key]`, `model name` | 無 | 對於不在上述表格中,但兼容 OpenAI API 的大語言模型,可以使用與 OpenAI 相同的方式設定環境變數。 使用 `-s service` 或 `-s service:model` 指定翻譯服務: ```bash pdf2zh example.pdf -s openai:gpt-4o-mini ``` 或使用環境變數指定模型: ```bash set OPENAI_MODEL=gpt-4o-mini pdf2zh example.pdf -s openai ```

指定例外規則

使用正則表達式指定需要保留的公式字體與字元: ```bash pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])" ``` 預設保留 `Latex`, `Mono`, `Code`, `Italic`, `Symbol` 以及 `Math` 字體: ```bash pdf2zh example.pdf -f "(CM[^R]|MS.M|XY|MT|BL|RM|EU|LA|RS|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)" ```

指定執行緒數量

使用 `-t` 參數指定翻譯使用的執行緒數量: ```bash pdf2zh example.pdf -t 1 ```

自訂大模型 Prompt

使用 `--prompt` 指定在使用大模型翻譯時所採用的 Prompt 檔案。 ```bash pdf2zh example.pdf -pr prompt.txt ``` 範例 `prompt.txt` 檔案內容: ``` [ { "role": "system", "content": "You are a professional,authentic machine translation engine.", }, { "role": "user", "content": "Translate the following markdown source text to ${lang_out}. Keep the formula notation {{v*}} unchanged. Output translation directly without any additional text.\nSource Text: ${text}\nTranslated Text:", }, ] ``` 在自訂 Prompt 檔案中,可以使用以下三個內建變數來傳遞參數: |**變數名稱**|**說明**| |-|-| |`lang_in`|輸入語言| |`lang_out`|輸出語言| |`text`|需要翻譯的文本|

API

### Python ```python from pdf2zh import translate, translate_stream params = {"lang_in": "en", "lang_out": "zh", "service": "google", "thread": 4} file_mono, file_dual = translate(files=["example.pdf"], **params)[0] with open("example.pdf", "rb") as f: stream_mono, stream_dual = translate_stream(stream=f.read(), **params) ``` ### HTTP ```bash pip install pdf2zh[backend] pdf2zh --flask pdf2zh --celery worker ``` ```bash curl http://localhost:11008/v1/translate -F "file=@example.pdf" -F "data={\"lang_in\":\"en\",\"lang_out\":\"zh\",\"service\":\"google\",\"thread\":4}" {"id":"d9894125-2f4e-45ea-9d93-1a9068d2045a"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a {"info":{"n":13,"total":506},"state":"PROGRESS"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a {"state":"SUCCESS"} curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/mono --output example-mono.pdf curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a/dual --output example-dual.pdf curl http://localhost:11008/v1/translate/d9894125-2f4e-45ea-9d93-1a9068d2045a -X DELETE ```

致謝

- 文件合併:[PyMuPDF](https://github.com/pymupdf/PyMuPDF) - 文件解析:[Pdfminer.six](https://github.com/pdfminer/pdfminer.six) - 文件提取:[MinerU](https://github.com/opendatalab/MinerU) - 文件預覽:[Gradio PDF](https://github.com/freddyaboulton/gradio-pdf) - 多執行緒翻譯:[MathTranslate](https://github.com/SUSYUSTC/MathTranslate) - 版面解析:[DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) - PDF 標準:[PDF Explained](https://zxyle.github.io/PDF-Explained/)、[PDF Cheat Sheets](https://pdfa.org/resource/pdf-cheat-sheets/) - 多語言字型:[Go Noto Universal](https://github.com/satbyy/go-noto-universal)

貢獻者

![Alt](https://repobeats.axiom.co/api/embed/dfa7583da5332a11468d686fbd29b92320a6a869.svg "Repobeats analytics image")

星標歷史

Star History Chart ## /docs/images/after.png Binary file available at https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/docs/images/after.png ## /docs/images/banner.png Binary file available at https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/docs/images/banner.png ## /docs/images/before.png Binary file available at https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/docs/images/before.png ## /docs/images/cmd.explained.png Binary file available at https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/docs/images/cmd.explained.png ## /docs/images/cmd.explained.zh.png Binary file available at https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/docs/images/cmd.explained.zh.png ## /docs/images/gui.gif Binary file available at https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/docs/images/gui.gif ## /docs/images/preview.gif Binary file available at https://raw.githubusercontent.com/Byaidu/PDFMathTranslate/refs/heads/main/docs/images/preview.gif ## /pdf2zh/__init__.py ```py path="/pdf2zh/__init__.py" import logging from pdf2zh.high_level import translate, translate_stream log = logging.getLogger(__name__) __version__ = "1.9.6" __author__ = "Byaidu" __all__ = ["translate", "translate_stream"] ``` ## /pdf2zh/backend.py ```py path="/pdf2zh/backend.py" from flask import Flask, request, send_file from celery import Celery, Task from celery.result import AsyncResult from pdf2zh import translate_stream import tqdm import json import io from pdf2zh.doclayout import ModelInstance from pdf2zh.config import ConfigManager flask_app = Flask("pdf2zh") flask_app.config.from_mapping( CELERY=dict( broker_url=ConfigManager.get("CELERY_BROKER", "redis://127.0.0.1:6379/0"), result_backend=ConfigManager.get("CELERY_RESULT", "redis://127.0.0.1:6379/0"), ) ) def celery_init_app(app: Flask) -> Celery: class FlaskTask(Task): def __call__(self, *args, **kwargs): with app.app_context(): return self.run(*args, **kwargs) celery_app = Celery(app.name) celery_app.config_from_object(app.config["CELERY"]) celery_app.Task = FlaskTask celery_app.set_default() celery_app.autodiscover_tasks() app.extensions["celery"] = celery_app return celery_app celery_app = celery_init_app(flask_app) @celery_app.task(bind=True) def translate_task( self: Task, stream: bytes, args: dict, ): def progress_bar(t: tqdm.tqdm): self.update_state(state="PROGRESS", meta={"n": t.n, "total": t.total}) # noqa print(f"Translating {t.n} / {t.total} pages") doc_mono, doc_dual = translate_stream( stream, callback=progress_bar, model=ModelInstance.value, **args, ) return doc_mono, doc_dual @flask_app.route("/v1/translate", methods=["POST"]) def create_translate_tasks(): file = request.files["file"] stream = file.stream.read() print(request.form.get("data")) args = json.loads(request.form.get("data")) task = translate_task.delay(stream, args) return {"id": task.id} @flask_app.route("/v1/translate/", methods=["GET"]) def get_translate_task(id: str): result: AsyncResult = celery_app.AsyncResult(id) if str(result.state) == "PROGRESS": return {"state": str(result.state), "info": result.info} else: return {"state": str(result.state)} @flask_app.route("/v1/translate/", methods=["DELETE"]) def delete_translate_task(id: str): result: AsyncResult = celery_app.AsyncResult(id) result.revoke(terminate=True) return {"state": str(result.state)} @flask_app.route("/v1/translate//") def get_translate_result(id: str, format: str): result = celery_app.AsyncResult(id) if not result.ready(): return {"error": "task not finished"}, 400 if not result.successful(): return {"error": "task failed"}, 400 doc_mono, doc_dual = result.get() to_send = doc_mono if format == "mono" else doc_dual return send_file(io.BytesIO(to_send), "application/pdf") if __name__ == "__main__": flask_app.run() ``` ## /pdf2zh/cache.py ```py path="/pdf2zh/cache.py" import logging import os import json from peewee import Model, SqliteDatabase, AutoField, CharField, TextField, SQL from typing import Optional # we don't init the database here db = SqliteDatabase(None) logger = logging.getLogger(__name__) class _TranslationCache(Model): id = AutoField() translate_engine = CharField(max_length=20) translate_engine_params = TextField() original_text = TextField() translation = TextField() class Meta: database = db constraints = [ SQL( """ UNIQUE ( translate_engine, translate_engine_params, original_text ) ON CONFLICT REPLACE """ ) ] class TranslationCache: @staticmethod def _sort_dict_recursively(obj): if isinstance(obj, dict): return { k: TranslationCache._sort_dict_recursively(v) for k in sorted(obj.keys()) for v in [obj[k]] } elif isinstance(obj, list): return [TranslationCache._sort_dict_recursively(item) for item in obj] return obj def __init__(self, translate_engine: str, translate_engine_params: dict = None): assert ( len(translate_engine) < 20 ), "current cache require translate engine name less than 20 characters" self.translate_engine = translate_engine self.replace_params(translate_engine_params) # The program typically starts multi-threaded translation # only after cache parameters are fully configured, # so thread safety doesn't need to be considered here. def replace_params(self, params: dict = None): if params is None: params = {} self.params = params params = self._sort_dict_recursively(params) self.translate_engine_params = json.dumps(params) def update_params(self, params: dict = None): if params is None: params = {} self.params.update(params) self.replace_params(self.params) def add_params(self, k: str, v): self.params[k] = v self.replace_params(self.params) # Since peewee and the underlying sqlite are thread-safe, # get and set operations don't need locks. def get(self, original_text: str) -> Optional[str]: result = _TranslationCache.get_or_none( translate_engine=self.translate_engine, translate_engine_params=self.translate_engine_params, original_text=original_text, ) return result.translation if result else None def set(self, original_text: str, translation: str): try: _TranslationCache.create( translate_engine=self.translate_engine, translate_engine_params=self.translate_engine_params, original_text=original_text, translation=translation, ) except Exception as e: logger.debug(f"Error setting cache: {e}") def init_db(remove_exists=False): cache_folder = os.path.join(os.path.expanduser("~"), ".cache", "pdf2zh") os.makedirs(cache_folder, exist_ok=True) # The current version does not support database migration, so add the version number to the file name. cache_db_path = os.path.join(cache_folder, "cache.v1.db") if remove_exists and os.path.exists(cache_db_path): os.remove(cache_db_path) db.init( cache_db_path, pragmas={ "journal_mode": "wal", "busy_timeout": 1000, }, ) db.create_tables([_TranslationCache], safe=True) def init_test_db(): import tempfile cache_db_path = tempfile.mktemp(suffix=".db") test_db = SqliteDatabase( cache_db_path, pragmas={ "journal_mode": "wal", "busy_timeout": 1000, }, ) test_db.bind([_TranslationCache], bind_refs=False, bind_backrefs=False) test_db.connect() test_db.create_tables([_TranslationCache], safe=True) return test_db def clean_test_db(test_db): test_db.drop_tables([_TranslationCache]) test_db.close() db_path = test_db.database if os.path.exists(db_path): os.remove(test_db.database) wal_path = db_path + "-wal" if os.path.exists(wal_path): os.remove(wal_path) shm_path = db_path + "-shm" if os.path.exists(shm_path): os.remove(shm_path) init_db() ``` ## /pdf2zh/config.py ```py path="/pdf2zh/config.py" import json from pathlib import Path from threading import RLock # 改成 RLock import os import copy class ConfigManager: _instance = None _lock = RLock() # 用 RLock 替换 Lock,允许在同一个线程中重复获取锁 @classmethod def get_instance(cls): """获取单例实例""" # 先判断是否存在实例,如果不存在再加锁进行初始化 if cls._instance is None: with cls._lock: if cls._instance is None: cls._instance = cls() return cls._instance def __init__(self): # 防止重复初始化 if hasattr(self, "_initialized") and self._initialized: return self._initialized = True self._config_path = Path.home() / ".config" / "PDFMathTranslate" / "config.json" self._config_data = {} # 这里不要再加锁,因为外层可能已经加了锁 (get_instance), RLock也无妨 self._ensure_config_exists() def _ensure_config_exists(self, isInit=True): """确保配置文件存在,如果不存在则创建默认配置""" # 这里也不需要显式再次加锁,原因同上,方法体中再调用 _load_config(), # 而 _load_config() 内部会加锁。因为 RLock 是可重入的,不会阻塞。 if not self._config_path.exists(): if isInit: self._config_path.parent.mkdir(parents=True, exist_ok=True) self._config_data = {} # 默认配置内容 self._save_config() else: raise ValueError(f"config file {self._config_path} not found!") else: self._load_config() def _load_config(self): """从 config.json 中加载配置""" with self._lock: # 加锁确保线程安全 with self._config_path.open("r", encoding="utf-8") as f: self._config_data = json.load(f) def _save_config(self): """保存配置到 config.json""" with self._lock: # 加锁确保线程安全 # 移除循环引用并写入 cleaned_data = self._remove_circular_references(self._config_data) with self._config_path.open("w", encoding="utf-8") as f: json.dump(cleaned_data, f, indent=4, ensure_ascii=False) def _remove_circular_references(self, obj, seen=None): """递归移除循环引用""" if seen is None: seen = set() obj_id = id(obj) if obj_id in seen: return None # 遇到已处理过的对象,视为循环引用 seen.add(obj_id) if isinstance(obj, dict): return { k: self._remove_circular_references(v, seen) for k, v in obj.items() } elif isinstance(obj, list): return [self._remove_circular_references(i, seen) for i in obj] return obj @classmethod def custome_config(cls, file_path): """使用自定义路径加载配置文件""" custom_path = Path(file_path) if not custom_path.exists(): raise ValueError(f"Config file {custom_path} not found!") # 加锁 with cls._lock: instance = cls() instance._config_path = custom_path # 此处传 isInit=False,若不存在则报错;若存在则正常 _load_config() instance._ensure_config_exists(isInit=False) cls._instance = instance @classmethod def get(cls, key, default=None): """获取配置值""" instance = cls.get_instance() # 读取时,加锁或不加锁都行。但为了统一,我们在修改配置前后都要加锁。 # get 只要最终需要保存,则会加锁 -> _save_config() if key in instance._config_data: return instance._config_data[key] # 若环境变量中存在该 key,则使用环境变量并写回 config if key in os.environ: value = os.environ[key] instance._config_data[key] = value instance._save_config() return value # 若 default 不为 None,则设置并保存 if default is not None: instance._config_data[key] = default instance._save_config() return default # 找不到则抛出异常 # raise KeyError(f"{key} is not found in config file or environment variables.") return default @classmethod def set(cls, key, value): """设置配置值并保存""" instance = cls.get_instance() with instance._lock: instance._config_data[key] = value instance._save_config() @classmethod def get_translator_by_name(cls, name): """根据 name 获取对应的 translator 配置""" instance = cls.get_instance() translators = instance._config_data.get("translators", []) for translator in translators: if translator.get("name") == name: return translator["envs"] return None @classmethod def set_translator_by_name(cls, name, new_translator_envs): """根据 name 设置或更新 translator 配置""" instance = cls.get_instance() with instance._lock: translators = instance._config_data.get("translators", []) for translator in translators: if translator.get("name") == name: translator["envs"] = copy.deepcopy(new_translator_envs) instance._save_config() return translators.append( {"name": name, "envs": copy.deepcopy(new_translator_envs)} ) instance._config_data["translators"] = translators instance._save_config() @classmethod def get_env_by_translatername(cls, translater_name, name, default=None): """根据 name 获取对应的 translator 配置""" instance = cls.get_instance() translators = instance._config_data.get("translators", []) for translator in translators: if translator.get("name") == translater_name.name: if translator["envs"][name]: return translator["envs"][name] else: with instance._lock: translator["envs"][name] = default instance._save_config() return default with instance._lock: translators = instance._config_data.get("translators", []) for translator in translators: if translator.get("name") == translater_name.name: translator["envs"][name] = default instance._save_config() return default translators.append( { "name": translater_name.name, "envs": copy.deepcopy(translater_name.envs), } ) instance._config_data["translators"] = translators instance._save_config() return default @classmethod def delete(cls, key): """删除配置值并保存""" instance = cls.get_instance() with instance._lock: if key in instance._config_data: del instance._config_data[key] instance._save_config() @classmethod def clear(cls): """删除配置值并保存""" instance = cls.get_instance() with instance._lock: instance._config_data = {} instance._save_config() @classmethod def all(cls): """返回所有配置项""" instance = cls.get_instance() # 这里只做读取操作,一般可不加锁。不过为了保险也可以加锁。 return instance._config_data @classmethod def remove(cls): instance = cls.get_instance() with instance._lock: os.remove(instance._config_path) ``` ## /pdf2zh/converter.py ```py path="/pdf2zh/converter.py" import concurrent.futures import logging import re import unicodedata from enum import Enum from string import Template from typing import Dict import numpy as np from pdfminer.converter import PDFConverter from pdfminer.layout import LTChar, LTFigure, LTLine, LTPage from pdfminer.pdffont import PDFCIDFont, PDFUnicodeNotDefined from pdfminer.pdfinterp import PDFGraphicState, PDFResourceManager from pdfminer.utils import apply_matrix_pt, mult_matrix from pymupdf import Font from tenacity import retry, wait_fixed from pdf2zh.translator import ( AnythingLLMTranslator, ArgosTranslator, AzureOpenAITranslator, AzureTranslator, BaseTranslator, BingTranslator, DeepLTranslator, DeepLXTranslator, DeepseekTranslator, DifyTranslator, GeminiTranslator, GoogleTranslator, GrokTranslator, GroqTranslator, ModelScopeTranslator, OllamaTranslator, OpenAIlikedTranslator, OpenAITranslator, QwenMtTranslator, SiliconTranslator, TencentTranslator, XinferenceTranslator, ZhipuTranslator, ) log = logging.getLogger(__name__) class PDFConverterEx(PDFConverter): def __init__( self, rsrcmgr: PDFResourceManager, ) -> None: PDFConverter.__init__(self, rsrcmgr, None, "utf-8", 1, None) def begin_page(self, page, ctm) -> None: # 重载替换 cropbox (x0, y0, x1, y1) = page.cropbox (x0, y0) = apply_matrix_pt(ctm, (x0, y0)) (x1, y1) = apply_matrix_pt(ctm, (x1, y1)) mediabox = (0, 0, abs(x0 - x1), abs(y0 - y1)) self.cur_item = LTPage(page.pageno, mediabox) def end_page(self, page): # 重载返回指令流 return self.receive_layout(self.cur_item) def begin_figure(self, name, bbox, matrix) -> None: # 重载设置 pageid self._stack.append(self.cur_item) self.cur_item = LTFigure(name, bbox, mult_matrix(matrix, self.ctm)) self.cur_item.pageid = self._stack[-1].pageid def end_figure(self, _: str) -> None: # 重载返回指令流 fig = self.cur_item assert isinstance(self.cur_item, LTFigure), str(type(self.cur_item)) self.cur_item = self._stack.pop() self.cur_item.add(fig) return self.receive_layout(fig) def render_char( self, matrix, font, fontsize: float, scaling: float, rise: float, cid: int, ncs, graphicstate: PDFGraphicState, ) -> float: # 重载设置 cid 和 font try: text = font.to_unichr(cid) assert isinstance(text, str), str(type(text)) except PDFUnicodeNotDefined: text = self.handle_undefined_char(font, cid) textwidth = font.char_width(cid) textdisp = font.char_disp(cid) item = LTChar( matrix, font, fontsize, scaling, rise, text, textwidth, textdisp, ncs, graphicstate, ) self.cur_item.add(item) item.cid = cid # hack 插入原字符编码 item.font = font # hack 插入原字符字体 return item.adv class Paragraph: def __init__(self, y, x, x0, x1, y0, y1, size, brk): self.y: float = y # 初始纵坐标 self.x: float = x # 初始横坐标 self.x0: float = x0 # 左边界 self.x1: float = x1 # 右边界 self.y0: float = y0 # 上边界 self.y1: float = y1 # 下边界 self.size: float = size # 字体大小 self.brk: bool = brk # 换行标记 # fmt: off class TranslateConverter(PDFConverterEx): def __init__( self, rsrcmgr, vfont: str = None, vchar: str = None, thread: int = 0, layout={}, lang_in: str = "", lang_out: str = "", service: str = "", noto_name: str = "", noto: Font = None, envs: Dict = None, prompt: Template = None, ignore_cache: bool = False, ) -> None: super().__init__(rsrcmgr) self.vfont = vfont self.vchar = vchar self.thread = thread self.layout = layout self.noto_name = noto_name self.noto = noto self.translator: BaseTranslator = None # e.g. "ollama:gemma2:9b" -> ["ollama", "gemma2:9b"] param = service.split(":", 1) service_name = param[0] service_model = param[1] if len(param) > 1 else None if not envs: envs = {} for translator in [GoogleTranslator, BingTranslator, DeepLTranslator, DeepLXTranslator, OllamaTranslator, XinferenceTranslator, AzureOpenAITranslator, OpenAITranslator, ZhipuTranslator, ModelScopeTranslator, SiliconTranslator, GeminiTranslator, AzureTranslator, TencentTranslator, DifyTranslator, AnythingLLMTranslator, ArgosTranslator, GrokTranslator, GroqTranslator, DeepseekTranslator, OpenAIlikedTranslator, QwenMtTranslator,]: if service_name == translator.name: self.translator = translator(lang_in, lang_out, service_model, envs=envs, prompt=prompt, ignore_cache=ignore_cache) if not self.translator: raise ValueError("Unsupported translation service") def receive_layout(self, ltpage: LTPage): # 段落 sstk: list[str] = [] # 段落文字栈 pstk: list[Paragraph] = [] # 段落属性栈 vbkt: int = 0 # 段落公式括号计数 # 公式组 vstk: list[LTChar] = [] # 公式符号组 vlstk: list[LTLine] = [] # 公式线条组 vfix: float = 0 # 公式纵向偏移 # 公式组栈 var: list[list[LTChar]] = [] # 公式符号组栈 varl: list[list[LTLine]] = [] # 公式线条组栈 varf: list[float] = [] # 公式纵向偏移栈 vlen: list[float] = [] # 公式宽度栈 # 全局 lstk: list[LTLine] = [] # 全局线条栈 xt: LTChar = None # 上一个字符 xt_cls: int = -1 # 上一个字符所属段落,保证无论第一个字符属于哪个类别都可以触发新段落 vmax: float = ltpage.width / 4 # 行内公式最大宽度 ops: str = "" # 渲染结果 def vflag(font: str, char: str): # 匹配公式(和角标)字体 if isinstance(font, bytes): # 不一定能 decode,直接转 str try: font = font.decode('utf-8') # 尝试使用 UTF-8 解码 except UnicodeDecodeError: font = "" font = font.split("+")[-1] # 字体名截断 if re.match(r"\(cid:", char): return True # 基于字体名规则的判定 if self.vfont: if re.match(self.vfont, font): return True else: if re.match( # latex 字体 r"(CM[^R]|MS.M|XY|MT|BL|RM|EU|LA|RS|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)", font, ): return True # 基于字符集规则的判定 if self.vchar: if re.match(self.vchar, char): return True else: if ( char and char != " " # 非空格 and ( unicodedata.category(char[0]) in ["Lm", "Mn", "Sk", "Sm", "Zl", "Zp", "Zs"] # 文字修饰符、数学符号、分隔符号 or ord(char[0]) in range(0x370, 0x400) # 希腊字母 ) ): return True return False ############################################################ # A. 原文档解析 for child in ltpage: if isinstance(child, LTChar): cur_v = False layout = self.layout[ltpage.pageid] # ltpage.height 可能是 fig 里面的高度,这里统一用 layout.shape h, w = layout.shape # 读取当前字符在 layout 中的类别 cx, cy = np.clip(int(child.x0), 0, w - 1), np.clip(int(child.y0), 0, h - 1) cls = layout[cy, cx] # 锚定文档中 bullet 的位置 if child.get_text() == "•": cls = 0 # 判定当前字符是否属于公式 if ( # 判定当前字符是否属于公式 cls == 0 # 1. 类别为保留区域 or (cls == xt_cls and len(sstk[-1].strip()) > 1 and child.size < pstk[-1].size * 0.79) # 2. 角标字体,有 0.76 的角标和 0.799 的大写,这里用 0.79 取中,同时考虑首字母放大的情况 or vflag(child.fontname, child.get_text()) # 3. 公式字体 or (child.matrix[0] == 0 and child.matrix[3] == 0) # 4. 垂直字体 ): cur_v = True # 判定括号组是否属于公式 if not cur_v: if vstk and child.get_text() == "(": cur_v = True vbkt += 1 if vbkt and child.get_text() == ")": cur_v = True vbkt -= 1 if ( # 判定当前公式是否结束 not cur_v # 1. 当前字符不属于公式 or cls != xt_cls # 2. 当前字符与前一个字符不属于同一段落 # or (abs(child.x0 - xt.x0) > vmax and cls != 0) # 3. 段落内换行,可能是一长串斜体的段落,也可能是段内分式换行,这里设个阈值进行区分 # 禁止纯公式(代码)段落换行,直到文字开始再重开文字段落,保证只存在两种情况 # A. 纯公式(代码)段落(锚定绝对位置)sstk[-1]=="" -> sstk[-1]=="{v*}" # B. 文字开头段落(排版相对位置)sstk[-1]!="" or (sstk[-1] != "" and abs(child.x0 - xt.x0) > vmax) # 因为 cls==xt_cls==0 一定有 sstk[-1]=="",所以这里不需要再判定 cls!=0 ): if vstk: if ( # 根据公式右侧的文字修正公式的纵向偏移 not cur_v # 1. 当前字符不属于公式 and cls == xt_cls # 2. 当前字符与前一个字符属于同一段落 and child.x0 > max([vch.x0 for vch in vstk]) # 3. 当前字符在公式右侧 ): vfix = vstk[0].y0 - child.y0 if sstk[-1] == "": xt_cls = -1 # 禁止纯公式段落(sstk[-1]=="{v*}")的后续连接,但是要考虑新字符和后续字符的连接,所以这里修改的是上个字符的类别 sstk[-1] += f"{{v{len(var)}}}" var.append(vstk) varl.append(vlstk) varf.append(vfix) vstk = [] vlstk = [] vfix = 0 # 当前字符不属于公式或当前字符是公式的第一个字符 if not vstk: if cls == xt_cls: # 当前字符与前一个字符属于同一段落 if child.x0 > xt.x1 + 1: # 添加行内空格 sstk[-1] += " " elif child.x1 < xt.x0: # 添加换行空格并标记原文段落存在换行 sstk[-1] += " " pstk[-1].brk = True else: # 根据当前字符构建一个新的段落 sstk.append("") pstk.append(Paragraph(child.y0, child.x0, child.x0, child.x0, child.y0, child.y1, child.size, False)) if not cur_v: # 文字入栈 if ( # 根据当前字符修正段落属性 child.size > pstk[-1].size # 1. 当前字符比段落字体大 or len(sstk[-1].strip()) == 1 # 2. 当前字符为段落第二个文字(考虑首字母放大的情况) ) and child.get_text() != " ": # 3. 当前字符不是空格 pstk[-1].y -= child.size - pstk[-1].size # 修正段落初始纵坐标,假设两个不同大小字符的上边界对齐 pstk[-1].size = child.size sstk[-1] += child.get_text() else: # 公式入栈 if ( # 根据公式左侧的文字修正公式的纵向偏移 not vstk # 1. 当前字符是公式的第一个字符 and cls == xt_cls # 2. 当前字符与前一个字符属于同一段落 and child.x0 > xt.x0 # 3. 前一个字符在公式左侧 ): vfix = child.y0 - xt.y0 vstk.append(child) # 更新段落边界,因为段落内换行之后可能是公式开头,所以要在外边处理 pstk[-1].x0 = min(pstk[-1].x0, child.x0) pstk[-1].x1 = max(pstk[-1].x1, child.x1) pstk[-1].y0 = min(pstk[-1].y0, child.y0) pstk[-1].y1 = max(pstk[-1].y1, child.y1) # 更新上一个字符 xt = child xt_cls = cls elif isinstance(child, LTFigure): # 图表 pass elif isinstance(child, LTLine): # 线条 layout = self.layout[ltpage.pageid] # ltpage.height 可能是 fig 里面的高度,这里统一用 layout.shape h, w = layout.shape # 读取当前线条在 layout 中的类别 cx, cy = np.clip(int(child.x0), 0, w - 1), np.clip(int(child.y0), 0, h - 1) cls = layout[cy, cx] if vstk and cls == xt_cls: # 公式线条 vlstk.append(child) else: # 全局线条 lstk.append(child) else: pass # 处理结尾 if vstk: # 公式出栈 sstk[-1] += f"{{v{len(var)}}}" var.append(vstk) varl.append(vlstk) varf.append(vfix) log.debug("\n==========[VSTACK]==========\n") for id, v in enumerate(var): # 计算公式宽度 l = max([vch.x1 for vch in v]) - v[0].x0 log.debug(f'< {l:.1f} {v[0].x0:.1f} {v[0].y0:.1f} {v[0].cid} {v[0].fontname} {len(varl[id])} > v{id} = {"".join([ch.get_text() for ch in v])}') vlen.append(l) ############################################################ # B. 段落翻译 log.debug("\n==========[SSTACK]==========\n") @retry(wait=wait_fixed(1)) def worker(s: str): # 多线程翻译 if not s.strip() or re.match(r"^\{v\d+\}$", s): # 空白和公式不翻译 return s try: new = self.translator.translate(s) return new except BaseException as e: if log.isEnabledFor(logging.DEBUG): log.exception(e) else: log.exception(e, exc_info=False) raise e with concurrent.futures.ThreadPoolExecutor( max_workers=self.thread ) as executor: news = list(executor.map(worker, sstk)) ############################################################ # C. 新文档排版 def raw_string(fcur: str, cstk: str): # 编码字符串 if fcur == self.noto_name: return "".join(["%04x" % self.noto.has_glyph(ord(c)) for c in cstk]) elif isinstance(self.fontmap[fcur], PDFCIDFont): # 判断编码长度 return "".join(["%04x" % ord(c) for c in cstk]) else: return "".join(["%02x" % ord(c) for c in cstk]) # 根据目标语言获取默认行距 LANG_LINEHEIGHT_MAP = { "zh-cn": 1.4, "zh-tw": 1.4, "zh-hans": 1.4, "zh-hant": 1.4, "zh": 1.4, "ja": 1.1, "ko": 1.2, "en": 1.2, "ar": 1.0, "ru": 0.8, "uk": 0.8, "ta": 0.8 } default_line_height = LANG_LINEHEIGHT_MAP.get(self.translator.lang_out.lower(), 1.1) # 小语种默认1.1 _x, _y = 0, 0 ops_list = [] def gen_op_txt(font, size, x, y, rtxt): return f"/{font} {size:f} Tf 1 0 0 1 {x:f} {y:f} Tm [<{rtxt}>] TJ " def gen_op_line(x, y, xlen, ylen, linewidth): return f"ET q 1 0 0 1 {x:f} {y:f} cm [] 0 d 0 J {linewidth:f} w 0 0 m {xlen:f} {ylen:f} l S Q BT " for id, new in enumerate(news): x: float = pstk[id].x # 段落初始横坐标 y: float = pstk[id].y # 段落初始纵坐标 x0: float = pstk[id].x0 # 段落左边界 x1: float = pstk[id].x1 # 段落右边界 height: float = pstk[id].y1 - pstk[id].y0 # 段落高度 size: float = pstk[id].size # 段落字体大小 brk: bool = pstk[id].brk # 段落换行标记 cstk: str = "" # 当前文字栈 fcur: str = None # 当前字体 ID lidx = 0 # 记录换行次数 tx = x fcur_ = fcur ptr = 0 log.debug(f"< {y} {x} {x0} {x1} {size} {brk} > {sstk[id]} | {new}") ops_vals: list[dict] = [] while ptr < len(new): vy_regex = re.match( r"\{\s*v([\d\s]+)\}", new[ptr:], re.IGNORECASE ) # 匹配 {vn} 公式标记 mod = 0 # 文字修饰符 if vy_regex: # 加载公式 ptr += len(vy_regex.group(0)) try: vid = int(vy_regex.group(1).replace(" ", "")) adv = vlen[vid] except Exception: continue # 翻译器可能会自动补个越界的公式标记 if var[vid][-1].get_text() and unicodedata.category(var[vid][-1].get_text()[0]) in ["Lm", "Mn", "Sk"]: # 文字修饰符 mod = var[vid][-1].width else: # 加载文字 ch = new[ptr] fcur_ = None try: if fcur_ is None and self.fontmap["tiro"].to_unichr(ord(ch)) == ch: fcur_ = "tiro" # 默认拉丁字体 except Exception: pass if fcur_ is None: fcur_ = self.noto_name # 默认非拉丁字体 if fcur_ == self.noto_name: # FIXME: change to CONST adv = self.noto.char_lengths(ch, size)[0] else: adv = self.fontmap[fcur_].char_width(ord(ch)) * size ptr += 1 if ( # 输出文字缓冲区 fcur_ != fcur # 1. 字体更新 or vy_regex # 2. 插入公式 or x + adv > x1 + 0.1 * size # 3. 到达右边界(可能一整行都被符号化,这里需要考虑浮点误差) ): if cstk: ops_vals.append({ "type": OpType.TEXT, "font": fcur, "size": size, "x": tx, "dy": 0, "rtxt": raw_string(fcur, cstk), "lidx": lidx }) cstk = "" if brk and x + adv > x1 + 0.1 * size: # 到达右边界且原文段落存在换行 x = x0 lidx += 1 if vy_regex: # 插入公式 fix = 0 if fcur is not None: # 段落内公式修正纵向偏移 fix = varf[vid] for vch in var[vid]: # 排版公式字符 vc = chr(vch.cid) ops_vals.append({ "type": OpType.TEXT, "font": self.fontid[vch.font], "size": vch.size, "x": x + vch.x0 - var[vid][0].x0, "dy": fix + vch.y0 - var[vid][0].y0, "rtxt": raw_string(self.fontid[vch.font], vc), "lidx": lidx }) if log.isEnabledFor(logging.DEBUG): lstk.append(LTLine(0.1, (_x, _y), (x + vch.x0 - var[vid][0].x0, fix + y + vch.y0 - var[vid][0].y0))) _x, _y = x + vch.x0 - var[vid][0].x0, fix + y + vch.y0 - var[vid][0].y0 for l in varl[vid]: # 排版公式线条 if l.linewidth < 5: # hack 有的文档会用粗线条当图片背景 ops_vals.append({ "type": OpType.LINE, "x": l.pts[0][0] + x - var[vid][0].x0, "dy": l.pts[0][1] + fix - var[vid][0].y0, "linewidth": l.linewidth, "xlen": l.pts[1][0] - l.pts[0][0], "ylen": l.pts[1][1] - l.pts[0][1], "lidx": lidx }) else: # 插入文字缓冲区 if not cstk: # 单行开头 tx = x if x == x0 and ch == " ": # 消除段落换行空格 adv = 0 else: cstk += ch else: cstk += ch adv -= mod # 文字修饰符 fcur = fcur_ x += adv if log.isEnabledFor(logging.DEBUG): lstk.append(LTLine(0.1, (_x, _y), (x, y))) _x, _y = x, y # 处理结尾 if cstk: ops_vals.append({ "type": OpType.TEXT, "font": fcur, "size": size, "x": tx, "dy": 0, "rtxt": raw_string(fcur, cstk), "lidx": lidx }) line_height = default_line_height while (lidx + 1) * size * line_height > height and line_height >= 1: line_height -= 0.05 for vals in ops_vals: if vals["type"] == OpType.TEXT: ops_list.append(gen_op_txt(vals["font"], vals["size"], vals["x"], vals["dy"] + y - vals["lidx"] * size * line_height, vals["rtxt"])) elif vals["type"] == OpType.LINE: ops_list.append(gen_op_line(vals["x"], vals["dy"] + y - vals["lidx"] * size * line_height, vals["xlen"], vals["ylen"], vals["linewidth"])) for l in lstk: # 排版全局线条 if l.linewidth < 5: # hack 有的文档会用粗线条当图片背景 ops_list.append(gen_op_line(l.pts[0][0], l.pts[0][1], l.pts[1][0] - l.pts[0][0], l.pts[1][1] - l.pts[0][1], l.linewidth)) ops = f"BT {''.join(ops_list)}ET " return ops class OpType(Enum): TEXT = "text" LINE = "line" ``` ## /pdf2zh/doclayout.py ```py path="/pdf2zh/doclayout.py" import abc import os.path import cv2 import numpy as np import ast from babeldoc.assets.assets import get_doclayout_onnx_model_path try: import onnx import onnxruntime except ImportError as e: if "DLL load failed" in str(e): raise OSError( "Microsoft Visual C++ Redistributable is not installed. " "Download it at https://aka.ms/vs/17/release/vc_redist.x64.exe" ) from e raise from huggingface_hub import hf_hub_download from pdf2zh.config import ConfigManager class DocLayoutModel(abc.ABC): @staticmethod def load_onnx(): model = OnnxModel.from_pretrained() return model @staticmethod def load_available(): return DocLayoutModel.load_onnx() @property @abc.abstractmethod def stride(self) -> int: """Stride of the model input.""" pass @abc.abstractmethod def predict(self, image, imgsz=1024, **kwargs) -> list: """ Predict the layout of a document page. Args: image: The image of the document page. imgsz: Resize the image to this size. Must be a multiple of the stride. **kwargs: Additional arguments. """ pass class YoloResult: """Helper class to store detection results from ONNX model.""" def __init__(self, boxes, names): self.boxes = [YoloBox(data=d) for d in boxes] self.boxes.sort(key=lambda x: x.conf, reverse=True) self.names = names class YoloBox: """Helper class to store detection results from ONNX model.""" def __init__(self, data): self.xyxy = data[:4] self.conf = data[-2] self.cls = data[-1] class OnnxModel(DocLayoutModel): def __init__(self, model_path: str): self.model_path = model_path model = onnx.load(model_path) metadata = {d.key: d.value for d in model.metadata_props} self._stride = ast.literal_eval(metadata["stride"]) self._names = ast.literal_eval(metadata["names"]) self.model = onnxruntime.InferenceSession(model.SerializeToString()) @staticmethod def from_pretrained(): pth = get_doclayout_onnx_model_path() return OnnxModel(pth) @property def stride(self): return self._stride def resize_and_pad_image(self, image, new_shape): """ Resize and pad the image to the specified size, ensuring dimensions are multiples of stride. Parameters: - image: Input image - new_shape: Target size (integer or (height, width) tuple) - stride: Padding alignment stride, default 32 Returns: - Processed image """ if isinstance(new_shape, int): new_shape = (new_shape, new_shape) h, w = image.shape[:2] new_h, new_w = new_shape # Calculate scaling ratio r = min(new_h / h, new_w / w) resized_h, resized_w = int(round(h * r)), int(round(w * r)) # Resize image image = cv2.resize( image, (resized_w, resized_h), interpolation=cv2.INTER_LINEAR ) # Calculate padding size and align to stride multiple pad_w = (new_w - resized_w) % self.stride pad_h = (new_h - resized_h) % self.stride top, bottom = pad_h // 2, pad_h - pad_h // 2 left, right = pad_w // 2, pad_w - pad_w // 2 # Add padding image = cv2.copyMakeBorder( image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114) ) return image def scale_boxes(self, img1_shape, boxes, img0_shape): """ Rescales bounding boxes (in the format of xyxy by default) from the shape of the image they were originally specified in (img1_shape) to the shape of a different image (img0_shape). Args: img1_shape (tuple): The shape of the image that the bounding boxes are for, in the format of (height, width). boxes (torch.Tensor): the bounding boxes of the objects in the image, in the format of (x1, y1, x2, y2) img0_shape (tuple): the shape of the target image, in the format of (height, width). Returns: boxes (torch.Tensor): The scaled bounding boxes, in the format of (x1, y1, x2, y2) """ # Calculate scaling ratio gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1]) # Calculate padding size pad_x = round((img1_shape[1] - img0_shape[1] * gain) / 2 - 0.1) pad_y = round((img1_shape[0] - img0_shape[0] * gain) / 2 - 0.1) # Remove padding and scale boxes boxes[..., :4] = (boxes[..., :4] - [pad_x, pad_y, pad_x, pad_y]) / gain return boxes def predict(self, image, imgsz=1024, **kwargs): # Preprocess input image orig_h, orig_w = image.shape[:2] pix = self.resize_and_pad_image(image, new_shape=imgsz) pix = np.transpose(pix, (2, 0, 1)) # CHW pix = np.expand_dims(pix, axis=0) # BCHW pix = pix.astype(np.float32) / 255.0 # Normalize to [0, 1] new_h, new_w = pix.shape[2:] # Run inference preds = self.model.run(None, {"images": pix})[0] # Postprocess predictions preds = preds[preds[..., 4] > 0.25] preds[..., :4] = self.scale_boxes( (new_h, new_w), preds[..., :4], (orig_h, orig_w) ) return [YoloResult(boxes=preds, names=self._names)] class ModelInstance: value: OnnxModel = None ``` The content has been capped at 50000 tokens, and files over NaN bytes have been omitted. The user could consider applying other filters to refine the result. The better and more specific the context, the better the LLM can follow instructions. If the context seems verbose, the user can refine the filter using uithub. Thank you for using https://uithub.com - Perfect LLM context for any GitHub repo.