🧪 Skills
Doc Ocr Skills
--- name: doc-ocr-skills description: OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local). --- # Document OCR Skill (docr) Uses **Gemini 2.5 Flash**, **Pad
v0.1.0
Description
name: doc-ocr-skills description: OCR documents (PDFs and images) using Gemini 2.5 Flash, PaddleOCR (local), or RapidOCR (local).
Document OCR Skill (docr)
Uses Gemini 2.5 Flash, PaddleOCR, or RapidOCR (local) to recognize text from scanned PDFs and images. Compiled as a single Go binary.
Prerequisites
- API Key configured in
~/.ocr/config(not needed for Paddle/Rapid) - For RapidOCR engine:
pip install rapidocr_onnxruntime - For PaddleOCR engine:
pip install paddleocr paddlepaddle
API Key Configuration
Create the config file:
mkdir -p ~/.ocr
cat > ~/.ocr/config << EOF
# Google Gemini API Key
gemini_api_key=your_gemini_key
EOF
Quick Start
Path Variable: All commands below use
$DOCR. Before running any command, set this variable:SKILL_DIR="$(cd "$(dirname "<path-to-this-SKILL.md>")" && pwd)" DOCR="$SKILL_DIR/scripts/docr/docr"
# OCR a single document using RapidOCR (default)
$DOCR document.pdf
$DOCR image.jpg
# Use Gemini engine
$DOCR -engine gemini document.pdf
# Use PaddleOCR local engine
$DOCR -engine paddle document.pdf
# Specify output file
$DOCR document.pdf -o result.txt
# Batch process all supported files in a directory
$DOCR -batch ./docs/ -o ./outputs/
Engines
| Engine | Flag | API Key Config | Doc Handling |
|---|---|---|---|
| RapidOCR (default) | -engine rapid |
None | Local OCR |
| Gemini | -engine gemini |
gemini_api_key |
Cloud Vision API |
| PaddleOCR (local) | -engine paddle |
None | Local OCR |
CLI Reference
docr [options] <file or directory>
Options:
-engine string OCR engine: rapid (default) / gemini / paddle
-e string Engine (short flag)
-o string Output file path or directory (batch mode)
-output string Output path (long flag)
-batch Batch mode: process all files in directory
-prompt string Custom recognition prompt (gemini)
Installation
We provide pre-compiled binaries to get you started quickly.
cd doc-ocr-skills/scripts
./install.sh
This script will detect your OS (darwin/linux) and architecture (amd64/arm64) and download the appropriate version of docr.
Building from Source (Optional)
If you prefer to build from source, ensure you have Go 1.21+ installed:
cd doc-ocr-skills/scripts/docr
go build -o docr .
Error Handling
| Error | Solution |
|---|---|
config file not found |
Create ~/.ocr/config with API keys |
gemini_api_key not found |
Add gemini_api_key=VALUE to config |
file not found |
Verify the document file path |
| API timeout | Retry; large files may need longer |
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!