Installing Anthropic Document Processing Skills in OpenClaw
The Anthropic skills repository provides high-quality document processing skills for PDF, PPTX, XLSX, and DOCX. These are knowledge-based skills (SKILL.md guides + helper scripts) that teach AI agents best practices for document handling. They do not require an Anthropic API key — they use standard open-source libraries.
What You Get
| Skill | Capabilities |
|---|---|
| Extract text/tables (pdfplumber), merge/split/rotate (pypdf), create PDFs (reportlab), OCR scanned docs (tesseract), fill forms | |
| PPTX | Read/extract text (markitdown), create slides (pptxgenjs), edit XML directly, convert to PDF/images (LibreOffice) |
| XLSX | Create/edit spreadsheets (openpyxl), data analysis (pandas), formula recalculation (LibreOffice) |
| DOCX | Create documents (docx npm package), read with pandoc, edit XML directly, handle tracked changes |
Step 1: Install System Dependencies
# System tools
sudo apt install -y poppler-utils qpdf tesseract-ocr libreoffice pandoc imagemagick
# Python libraries
pip install pypdf pdfplumber reportlab pytesseract pdf2image openpyxl pandas "markitdown[pptx]" Pillow
# Node.js packages
npm install -g pptxgenjs docx
Dependency Matrix
| Dependency | Used By | Purpose |
|---|---|---|
poppler-utils | PDF, PPTX | pdftotext, pdfimages, pdftoppm |
qpdf | PDF linearization, repair | |
tesseract-ocr | OCR for scanned documents | |
libreoffice | PPTX, XLSX, DOCX | Format conversion, formula recalc, accept tracked changes |
pandoc | DOCX | Read content, extract tracked changes |
imagemagick | Image processing | |
pypdf | Merge, split, rotate, encrypt | |
pdfplumber | Text and table extraction | |
reportlab | Create new PDFs | |
openpyxl | XLSX | Read/write Excel with formatting |
pandas | XLSX | Data analysis |
pptxgenjs (npm) | PPTX | Create presentations from scratch |
docx (npm) | DOCX | Create Word documents from scratch |
Step 2: Clone and Copy Skills
cd ~/.openclaw/workspace
# Clone the repo
git clone https://github.com/anthropics/skills.git anthropic-skills
# Create skill directories and copy
mkdir -p skills/pdf skills/pptx skills/xlsx skills/docx
cp -r anthropic-skills/skills/pdf/* skills/pdf/
cp -r anthropic-skills/skills/pptx/* skills/pptx/
cp -r anthropic-skills/skills/xlsx/* skills/xlsx/
cp -r anthropic-skills/skills/docx/* skills/docx/
# Clean up — remove the cloned repo
rm -rf anthropic-skills
Each skill directory should contain at minimum a SKILL.md and a scripts/ folder.
Step 3: Verify
openclaw skills list
You should see all four skills with status ✓ ready:
│ ✓ ready │ 📦 pdf │ ... │ openclaw-workspace │
│ ✓ ready │ 📦 pptx │ ... │ openclaw-workspace │
│ ✓ ready │ 📦 xlsx │ ... │ openclaw-workspace │
│ ✓ ready │ 📦 docx │ ... │ openclaw-workspace │
Step 4: Restart OpenClaw
openclaw gateway restart
How It Works
These skills are not standalone tools — they are agent knowledge files. When your OpenClaw agent encounters a document task, it:
- Reads the relevant
SKILL.mdfor best practices and tool selection - Executes Python/Node.js commands via
execusing the installed libraries - Follows the quality assurance steps defined in the skill
For example, when asked to “extract tables from a PDF”:
Agent reads: skills/pdf/SKILL.md
Agent runs: python3 -c "import pdfplumber; ..."
Agent returns: extracted table data
Updating Skills
To pull the latest from Anthropic:
cd ~/.openclaw/workspace
git clone https://github.com/anthropics/skills.git anthropic-skills
cp -r anthropic-skills/skills/pdf/* skills/pdf/
cp -r anthropic-skills/skills/pptx/* skills/pptx/
cp -r anthropic-skills/skills/xlsx/* skills/xlsx/
cp -r anthropic-skills/skills/docx/* skills/docx/
rm -rf anthropic-skills
Notes
- No Anthropic API key needed — these skills use only open-source tools
- Linux fully compatible — all dependencies are standard packages
- LibreOffice is the heaviest dependency — required for PPTX/XLSX/DOCX format conversion and formula recalculation; can be skipped if you only need PDF
- The shared
scripts/office/soffice.pywrapper handles LibreOffice headless mode automatically