2026 edition
Last verified: 2026-04-15
36 PDF skills / tools
Agent-ready

2026 PDF Skills Report: A Complete Installation and Selection Guide for Codex, Claude Code, OpenClaw, and Trae

In 2026, evaluating PDF skills means looking beyond the capability itself and focusing on how those capabilities can be installed, packaged, governed, and productionized inside Codex, Claude Code, OpenClaw, and Trae.

Most important finding

Whether an agent can use a PDF skill depends first on packaging form, not on the model name.

Strongest native support

Claude Code is the clearest across skills, MCP, and plugins; OpenClaw is also strong on workspace skills.

Most practical open-source base

OCRmyPDF, Docling, MinerU, PyMuPDF, and qpdf form the strongest local open-source baseline in 2026.

High-value closed layer

Reflo, DeepL, Adobe, OpenAI, Anthropic, Mistral, Google, Azure, and AWS fit teams that prioritize accuracy or enterprise governance.

Key takeaway

As of April 15, 2026, the practical question is no longer 'which PDF app is best' but 'which PDF skills can be installed, audited, and productionized inside real agent stacks.'

Core judgment
  • As of April 15, 2026, OpenClaw and Trae Agent clearly ship open-source code; Claude Code now has an official GitHub repo; Codex CLI is open-source, while the Codex app and cloud remain managed OpenAI product surfaces.
  • For PDF workflows, CLI / Python / Java libraries are the most portable install form across all four agent families. MCP is best-documented in Claude Code and Trae, while Codex and OpenClaw benefit from treating MCP as a wrapper layer rather than the only dependency.
  • Desktop GUI PDF tools are not unusable for agents, but they depend more on browser or desktop automation and are typically weaker on stability and auditability than CLI or API routes.
Agent Install

How OpenClaw, Claude Code, Codex, and Trae install PDF skills

Everything below is tied to what was publicly verifiable on April 15, 2026, and distinguishes native support from wrapper-based support and UI-first workarounds.

Codex

Install the CLI or IDE first, then package PDF tooling with AGENTS.md and skills

Hybrid
Best for: PDF automation inside codebases, batch scripts, document understanding, and translation-delivery workflows

Installation pattern

  • Install Codex CLI with `npm install -g @openai/codex` or `brew install --cask codex`.
  • Add `AGENTS.md` at the repo root and document the PDF workflow, test commands, and permission boundaries.
  • Package PDF tooling as repo scripts such as `tools/pdf/run_ocr.sh` or `tools/pdf/parse_docling.py`.
  • If you use Codex skills in the app, CLI, or IDE, bundle instructions, resources, and scripts as reusable skills.

Capabilities and limits

  • Public documentation now confirms both Skills and AGENTS.md.
  • The CLI is open-source, while the app and cloud remain managed product surfaces.
  • The safest integration path for PDF skills is still repo-level scripts plus AGENTS.md.

Claude Code

Four official extension paths: skills, plugins, MCP, and CLAUDE.md

Hybrid
Best for: Teams that want the clearest official extension model and reusable PDF skills as durable team assets

Installation pattern

  • Install Claude Code with `curl -fsSL https://claude.ai/install.sh | bash`, or use Homebrew / WinGet.
  • Native skills live in `~/.claude/skills/<skill>/SKILL.md` or project-level `.claude/skills/<skill>/SKILL.md`.
  • Use `claude mcp add ...` to connect local stdio servers, remote HTTP servers, or OAuth-backed tools.
  • Bundle skills, agents, hooks, and MCP servers into plugins when you need a sharable team distribution format.

Capabilities and limits

  • Its public documentation is the most explicit across skills, plugins, and MCP.
  • It works well for both native skills and MCP wrappers around OCR, parsing, translation, and RAG services.
  • The core CLI has an official GitHub repo, while the model and service layer remain proprietary.

OpenClaw

Workspace skills, plugins, ClawHub, and a gateway make it the closest thing to a personal agent OS

Open source
Best for: Personal AI assistants, multi-channel automation, and mixed browser / desktop / shell PDF workflows

Installation pattern

  • Install OpenClaw from the official repo or install script; the main runtime entry is the Gateway.
  • Workspace skills live in `~/.openclaw/workspace/skills/<skill>/SKILL.md`.
  • Use plugins when a PDF workflow also needs channels, host integrations, or system capabilities.
  • If ClawHub is enabled, agents can search and fetch skills, but production setups should still whitelist and review them.

Capabilities and limits

  • The official README documents workspace roots, skills paths, and ClawHub behavior clearly.
  • It is stronger than pure IDE agents when browser, desktop, or host-command automation matters.
  • It is also the most open, which means permission control and supply-chain review matter more.

Trae

Trae IDE uses Agent Skills, @Agent, and MCP; the OSS Trae Agent uses YAML plus MCP

Hybrid
Best for: IDE collaboration, multi-agent orchestration, and research-oriented or extensible coding-agent setups

Installation pattern

  • Install Trae IDE or SOLO from the official download page when you want the desktop product surface.
  • The official Trae blog already documents Agent Skills creation, import, usage, and MCP support through `@Agent`.
  • For the open-source agent path, use `git clone https://github.com/bytedance/trae-agent.git && uv sync --all-extras`.
  • Add `mcp_servers` in the `trae-agent` config to attach external PDF skills and document tools.

Capabilities and limits

  • Trae Agent has an official MIT-licensed GitHub repo.
  • The Trae IDE and SOLO product surfaces publicly point to Agent Skills and MCP usage.
  • Use the open agent when you need tighter control; add the IDE when visual workflows matter.
Compatibility

Which agents can really use which PDF skill packaging patterns

Do not confuse 'supports PDFs' with 'supports every PDF skill.' Real compatibility depends on packaging form: native skills, repo rules, CLI/libraries, MCP, SaaS APIs, or desktop GUI automation.

Install formCodexClaude CodeOpenClawTraeVerdict
Native skills / commandsDirectNativeNativeDirectClaude Code is the clearest today; OpenClaw has workspace skills; Trae has Agent Skills; Codex publicly confirms Skills but exposes fewer file-system details.
Repo rules files (AGENTS.md / CLAUDE.md / Rules)NativeDirectDirectDirectAll four agent families can consume this layer; it is the most portable and least coupled way to inject team knowledge.
CLI / Python / Java librariesDirectDirectDirectDirectThis is the most reusable packaging form across agent families and the best first layer to deploy.
MCP serverWrapperNativeWrapperDirectClaude Code is strongest natively; Trae also points clearly to MCP; Codex and OpenClaw usually benefit from MCP through wrappers, plugins, or gateways.
SaaS API / cloud serviceDirectDirectDirectDirectAll four agent families can use this layer reliably when API keys are governed and packaged as tools or scripts.
Desktop GUI / RPALimitedLimitedDirectWrapperOpenClaw is friendlier to browser and desktop control; Codex and Claude Code should not treat GUI automation as the primary path.
Catalog

36 PDF skills / tools with open-vs-closed status, GitHub, and installability

Here, 'skills' are concrete installable building blocks: open-source libraries, CLIs, MCP servers, SaaS APIs, and desktop products. Open-source rows include GitHub.

Skill / ToolCategoryOpen vs closedInstall formGitHub / officialBest forNote
Tesseract OCROCROpenCLI / library
General OCRMultilingual OCR
Strong local open-source baseDepends on preprocessing quality
OCRmyPDFOCROpenCLI / library
Searchable PDF outputAgent preprocessing
Strong local open-source baseCommon in production pipelines
PaddleOCROCROpenCLI / library
Multilingual OCREnterprise forms and contracts
Strong in Chinese-heavy workflowsCommon in production pipelines
docTROCROpenCLI / library
General OCREnterprise forms and contracts
Research-friendlyDepends on preprocessing quality
DoclingPDF parsingOpenCLI / library
LLM-ready structuringComplex layouts
Useful as pipeline infrastructureWorks especially well with MCP
docling-mcpPDF parsingOpenMCP
MCP integrationLLM-ready structuring
Works especially well with MCPUseful as pipeline infrastructure
GROBIDPDF parsingOpenCLI / library
Academic papersResearch and technical PDFs
Research-friendlyCommon in production pipelines
NougatPDF parsingOpenCLI / library
Academic papersFormula-heavy documents
Research-friendlyNot a general-purpose OCR tool
MinerUPDF parsingOpenCLI / library
Complex layoutsFormula-heavy documents
Strong on complex layoutsCommon in production pipelines
PyMuPDFPDF operationsOpenCLI / library
High-performance runtimeLightweight PDF operations
Common in production pipelinesUseful as pipeline infrastructure
PyMuPDF4LLMPDF operationsOpenCLI / library
Agent preprocessingLLM-ready structuring
Useful as pipeline infrastructureCommon in production pipelines
pypdfPDF operationsOpenCLI / library
Lightweight PDF operationsPDF structure operations
Pure Python friendlyUseful as pipeline infrastructure
pdfplumberTable extractionOpenCLI / library
Table debuggingText-based tables
Good for debuggingUseful as pipeline infrastructure
UnstructuredDocument ETLOpenCLI / library
Document chunkingAgent preprocessing
Useful as pipeline infrastructureGood for team workflows
unstructured-apiDocument ETLOpenSaaS API
Internal API layerDocument chunking
API-firstGood for team workflows
TabulaTable extractionOpenCLI / library
Text-based tablesBatch table extraction
Weak on noisy scansCommon in production pipelines
tabula-javaTable extractionOpenCLI / library
Batch table extractionJava enterprise stacks
Common in production pipelinesUseful as pipeline infrastructure
qpdfPDF operationsOpenCLI / library
PDF structure operationsBatch post-processing
Common in production pipelinesUseful as pipeline infrastructure
pdfcpuPDF operationsOpenCLI / library
Batch post-processingPDF structure operations
Common in production pipelinesUseful as pipeline infrastructure
Apache PDFBoxPDF operationsOpenCLI / library
Java enterprise stacksPDF structure operations
Common in production pipelinesGood for team workflows
OpenAI PDF FilesRAG / reasoningClosedSaaS API
PDF reasoningCross-document search
API-firstBetter for reasoning than layout fidelity
OpenAI File SearchRAG / reasoningClosedSaaS API
Cross-document searchTeam knowledge search
API-firstGood for team workflows
Claude PDF SupportRAG / reasoningClosedSaaS API
PDF reasoningResearch and technical PDFs
API-firstBetter for reasoning than layout fidelity
Claude CitationsKnowledge Q&AClosedSaaS API
Grounded answersTeam knowledge search
API-firstGood for team workflows
Mistral OCREnterprise document AIClosedSaaS API
Cloud OCR APIComplex layouts
API-firstAdds vendor cost and dependency
Mathpix PDF to MarkdownPDF parsingClosedSaaS API
Formula-heavy documentsAcademic papers
Research-friendlyAdds vendor cost and dependency
Google Document AIEnterprise document AIClosedSaaS API
Enterprise forms and contractsInternal API layer
Enterprise-orientedAPI-first
Azure Document IntelligenceEnterprise document AIClosedSaaS API
Enterprise forms and contractsCloud OCR API
Enterprise-orientedAPI-first
Amazon TextractEnterprise document AIClosedSaaS API
Enterprise forms and contractsCloud OCR API
Enterprise-orientedAPI-first
Adobe Acrobat AI AssistantDesktop PDFClosedDesktop GUI / RPA
Desktop reviewTeam knowledge search
GUI-firstOften needs wrapper automation
Adobe Translate PDFTranslationClosedDesktop GUI / RPA
Desktop translation workflowsMultilingual delivery
GUI-firstHigh-value translation layer
ABBYY FineReader PDFDesktop PDFClosedDesktop GUI / RPA
Desktop OCR and reviewSearchable PDF output
GUI-firstEnterprise-oriented
NanonetsInvoice automationClosedSaaS API
Invoices and receiptsInternal API layer
API-firstEnterprise-oriented
RossumInvoice automationClosedSaaS API
Invoices and receiptsEnterprise forms and contracts
Enterprise-orientedAPI-first
ParseurTemplate extractionClosedSaaS API
Template-driven extractionInternal API layer
API-firstCommon in production pipelines
RefloTranslationClosedSaaS API
Multilingual deliveryDesktop translation workflows
High-value translation layerStrong on complex layouts
DeepL Files + GlossaryTranslationClosedSaaS API
Termbase-driven translationMultilingual delivery
High-value translation layerGood for team workflows
Smallpdf Translate PDFTranslationClosedDesktop GUI / RPA
Quick consumer translationDesktop translation workflows
GUI-firstOften needs wrapper automation
iLovePDF Translate PDFTranslationClosedDesktop GUI / RPA
Quick consumer translationDesktop translation workflows
GUI-firstOften needs wrapper automation
PDFgear ChatPDFKnowledge Q&AClosedDesktop GUI / RPA
Desktop chat with PDFPDF reasoning
GUI-firstOften needs wrapper automation
UPDF Chat with PDFKnowledge Q&AClosedDesktop GUI / RPA
Desktop chat with PDFPDF reasoning
GUI-firstOften needs wrapper automation
AskYourPDFKnowledge Q&AClosedSaaS API
PDF reasoningTeam knowledge search
API-firstBetter for reasoning than layout fidelity
HumataKnowledge Q&AClosedSaaS API
Team knowledge searchCross-document search
API-firstGood for team workflows
Solution

A production solution is a stack, not a shopping list

A deployable PDF-agent solution combines an agent, PDF skills, a packaging layer, permission controls, and sample-document regression tests. Buying isolated tools is not enough.

Blueprint A: Local-first open-source PDF agent baseline

Best fit: Privacy-sensitive teams that want control, lower vendor dependency, and are willing to operate their own stack

Recommended stack

  • Agent: Claude Code or OpenClaw, with Trae Agent OSS as a strong alternative
  • OCR: Tesseract + OCRmyPDF + PaddleOCR
  • Parsing: Docling / MinerU / GROBID / Nougat
  • Operations: PyMuPDF + pypdf + qpdf + pdfcpu
  • Tables: pdfplumber + Tabula / tabula-java

Implementation steps

  • Install PDF capabilities first as CLI tools and Python scripts instead of starting with GUI products.
  • Package those scripts as reusable skills for each agent family: `.claude/skills`, OpenClaw workspace skills, Trae Agent Skills or YAML, and Codex repo scripts plus AGENTS.md.
  • Prepare 5 to 10 sample documents per document type and run regression checks for OCR, tables, formulas, and reading order.

Main risks

  • Self-hosted stacks cost more to maintain than SaaS layers.
  • Accuracy can drop on complex layouts and low-resource languages.
  • Permissions, logging, and regression governance remain your responsibility.

Blueprint B: Enterprise API-centered PDF agent platform

Best fit: Enterprises that already run cloud infrastructure and care about SLA, auditability, identity, and compliance

Recommended stack

  • Agent: Claude Code or Trae, with Codex covering the code and automation layer
  • OCR / extraction: Google Document AI / Azure Document Intelligence / Amazon Textract
  • Knowledge layer: OpenAI PDF Files + File Search or Claude PDF + citations
  • Business flow tools: Nanonets / Rossum / Parseur
  • Post-processing: qpdf / pypdf / PyMuPDF

Implementation steps

  • Wrap closed cloud services behind internal APIs or MCP wrappers instead of wiring every vendor directly into the agent.
  • Route contracts, invoices, research PDFs, and branded collateral through different queues rather than sharing a single prompt chain.
  • Put permissions and audit controls in the orchestration layer, not inside prompts.

Main risks

  • Vendor lock-in and cost growth remain real risks.
  • API output structures may drift after model or service upgrades.
  • Cross-border data flow and compliance boundaries must be reviewed in advance.

Blueprint C: Multilingual PDF delivery stack

Best fit: Teams handling papers, manuals, contracts, overseas sales assets, and multilingual branded materials

Recommended stack

  • Agent: Codex or Claude Code for orchestration, batching, review, and download flows
  • Delivery translation layer: Reflo
  • Terminology layer: DeepL Glossary or an internal termbase
  • Post-processing ecosystem: Adobe Acrobat / Adobe Translate PDF
  • Quality control: PyMuPDF / qpdf / pdfcpu

Implementation steps

  • Define termbases, language pairs, and document classes before letting the agent run batch orchestration.
  • Route high-value files through the Reflo / DeepL / Adobe combination and reserve lighter products for lower-risk content.
  • Keep a human side-by-side review step before any customer-facing delivery.

Main risks

  • The closed translation layer costs more than a purely open-source stack.
  • Complex PDFs still require sampled human QA.
  • Errors in branded materials and contracts are expensive, so review gates remain mandatory.
Methodology

Method and evidence model

Verification date: 2026-04-15
Source types: Official product pages, official GitHub repos, help centers, dev docs, install docs
Research objects: 4 agent platforms, 36 PDF skills / tools, 6 install forms, and 3 deployable solution blueprints
  • The evidence layer includes only official product pages, official GitHub repos, official help centers, and official developer docs. Secondary media coverage was excluded from scoring.
  • Installability was divided into six forms: native skills, repo rule files, CLI/libraries, MCP, SaaS APIs, and GUI/RPA.
  • Agent compatibility was judged by whether the official product exposes skills, commands, plugins, MCP, workspace files, CLIs, or APIs, not by marketing language alone.
  • Public OpenAI materials now confirm both Skills and AGENTS.md for Codex, but the public install spec for native skills is still less explicit than Claude Code's. Where needed, the report marks those Codex recommendations as implementation guidance rather than full official file-by-file specification.
Sources

Official source list

To stay aligned with EEAT, this report prioritizes official domains, official GitHub repos, help centers, and official developer docs. Any inference is explicitly marked instead of being mixed with verified facts.

FAQ

Common questions

Is Claude Code open-source or closed-source?

As of April 15, 2026, Claude Code has an official GitHub repo. The practical classification is hybrid: an open CLI surface with proprietary models and hosted service layers.

Can Codex install PDF skills the same way Claude Code does?

Yes, but the safest publicly documented path is still AGENTS.md plus repo scripts and PDF CLI/API tools. OpenAI has publicly confirmed Skills, but the public file-system specification is still less explicit than Claude Code's.

Is OpenClaw a good fit for GUI-style PDF tools?

Yes, especially when browser, desktop, and messaging channels matter. But GUI automation is generally less stable and less auditable than CLI or API approaches.

Is Trae open-source or closed-source?

It depends on which surface you mean. Trae Agent has an official open-source MIT repo; Trae IDE and SOLO remain closer to closed hosted product surfaces.

What is the minimum viable stack for a reliable PDF agent?

Start with OCRmyPDF, Docling or MinerU, PyMuPDF or pypdf, and qpdf. Then add OpenAI, Claude, Reflo, or DeepL only where your workflow truly needs those layers.

Final recommendation

Choose install form first, PDF skill second, and model brand third

In 2026, successful PDF-agent systems are decided more by CLI/API/MCP installability, auditability, and permission design than by model branding alone. For multilingual PDF delivery, Reflo plus DeepL or Adobe is strong on the closed-source side; for local open-source baselines, OCRmyPDF, Docling, MinerU, PyMuPDF, and qpdf remain the practical core.

2026 PDF Skills Report: A Complete Installation and Selection Guide for Codex, Claude Code, OpenClaw, and Trae