Super Ocr
Production-grade OCR with intelligent engine selection. Tesseract (lightweight, fast) and PaddleOCR (high accuracy, Chinese-optimized). Use when extracting t...
Description
name: super-ocr description: "Production-grade OCR with intelligent engine selection. Tesseract (lightweight, fast) and PaddleOCR (high accuracy, Chinese-optimized). Use when extracting text from images, processing Chinese documents, needing confidence scores, or working with mixed Chinese/English content."
Super OCR
Overview
Super OCR is a production-grade optical character recognition tool that intelligently selects the best engine for your needs:
- Tesseract Engine: Lightweight, fast (~200-500ms), perfect for simple text extraction
- PaddleOCR Engine: High accuracy (98%+), optimized for Chinese, ideal for complex documents
Engine Selection Strategy
Auto Mode (Default)
The skill automatically selects the optimal engine:
| Scenario | Selected Engine | Why |
|---|---|---|
| Simple text, English only | Tesseract | Faster, lighter dependency |
| Chinese content, high accuracy needed | PaddleOCR | Better Chinese support, 98%+ accuracy |
| Low confidence from Tesseract | PaddleOCR (fallback) | Quality assurance |
Force Mode
Users can explicitly choose an engine:
--engine tesseract- Use Tesseract only--engine paddle- Use PaddleOCR only--engine auto- Auto-select (default)
Quick Start
Installation
This skill requires the following dependencies:
- PaddleOCR (for Chinese text recognition - 98%+ accuracy)
- Tesseract (for fast English text recognition)
- OpenCV (for image preprocessing)
Option 1: Install with pip (all-in-one)
pip install paddleocr paddlepaddle pytesseract pillow opencv-python numpy
Option 2: Install dependencies manually
macOS:
# Tesseract
brew install tesseract
# PaddleOCR
pip install paddleocr paddlepaddle
Ubuntu/Debian:
# Tesseract
sudo apt update && sudo apt install tesseract-ocr
# PaddleOCR
pip install paddleocr paddlepaddle
Windows:
# Download Tesseract from: https://github.com/UB-Mannheim/tesseract/wiki
pip install paddleocr paddlepaddle pytesseract pillow opencv-python numpy
Usage
# Auto mode (recommended) - runs all available engines
cd path/to/super-ocr
python scripts/main.py --image path/to/image.png
# Force Tesseract only
python scripts/main.py --image document.jpg --engine tesseract
# Force PaddleOCR (high accuracy Chinese)
python scripts/main.py --image chinese_menu.png --engine paddle
# Run all engines (macOS only: Tesseract + PaddleOCR + MacVision)
python scripts/main.py --image complex_doc.png --engine all
# Batch processing with output directory
python scripts/main.py --images ./images/*.png --output ./results --verbose
# Check dependencies and auto-install
python scripts/dependencies.py --check --install
Structuring This Skill
This skill uses a capabilities-based structure with multiple execution modes:
- Engine Selection Logic - Intelligent decision making
- OCR Execution - Unified interface for different engines
- Post-processing - Standardized output formatting
- Validation & Fallback - Quality assurance
Core Capabilities
1. Intelligent Engine Selection
The skill includes a decision tree that analyzes:
- Image characteristics (contrast, text size)
- Language patterns (Chinese character detection)
- User requirements (speed vs accuracy)
See scripts/engine_selector.py for implementation details.
2. Dual Engine Support
Tesseract Engine (scripts/tesseract_ocr.py):
- Fast preprocessing pipeline
- PSM mode 6 for uniform text blocks
- Confidence scoring per word
- Language detection
PaddleOCR Engine (scripts/paddle_ocr.py):
- State-of-art? SN (East text detection)
- Crnn recognition with LSTM
- Confidence scores per character
- Table detection support
3. Output Formats
Supports multiple output formats:
| Format | Content | Use Case |
|---|---|---|
| Text only | Clean extracted text | Simple search/grep |
| Structured | Text + positions | Data extraction |
| JSON | Full metadata + confidence | API integration |
| Verbose | Debug info | Quality assurance |
4. Quality Guarantees
- Confidence thresholds (configurable, default 80%)
- Low-confidence alerts for manual review
- \Fallback processing for failed OCRs
Resources
scripts/
main.py- Main entry point, CLI interface (supports multi-engine)dependencies.py- Auto-install and validationoutput_formatter.py- Multiple output format supportengine/- OCR engine implementationsselector.py- Intelligent engine selection logictesseract.py- Tesseract engine wrapperpaddle.py- PaddleOCR engine wrappermacvision.py- macOS Vision OCR (macOS only)
preprocessing/- Image preprocessing utilitiespreprocessor.py- Denoising, enhancement, binarization
dependencies.py (Key Feature)
The dependencies.py module handles:
- Dependency detection (
paddleocr,paddlepaddle,pytesseract,cv2) - Auto-install on missing dependencies
- version checking
- OS-specific installation commands
- Clear error messages with troubleshooting steps
Use this when setting up a new environment with python scripts/dependencies.py --check --install
Advanced Features
Custom Configuration
Create config.yaml for persistent settings:
default_engine: auto
confidence_threshold: 0.8
output_format: json
preprocess:
denoise: true
enhance_contrast: true
Batch Processing
Process multiple images:
python scripts/ocr.py --images ./images/*.png --output ./results
API Mode
Use as a Python library:
from super_ocr import OCRProcessor
processor = OCRProcessor(engine='auto')
result = processor.extract('image.png')
print(result.text)
print(result.confidence)
Anti-Patterns
- ❌ Using PaddleOCR for every image (overhead for simple cases)
- ❌ ignoring confidence scores (quality matters)
- ❌ Biases (always prefering one engine)
- ❌ Skipping preprocessing (quality impact)
Performance Notes
| Engine | Init Time | Per-Image | Memory | Best For |
|---|---|---|---|---|
| Tesseract | ~200ms | ~50ms | ~100MB | Quick extraction |
| PaddleOCR | ~3s | ~500ms | ~500MB | High accuracy |
Initialize once, reuse processor for batch processing.
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!