🧪 Skills
Vector Text Fixer
Fix garbled text in PDF/SVG vector graphics for final editing in AI. Detect, replace and repair garbled text in vector graphic files while maintaining origin...
v0.1.0
Description
name: vector-text-fixer description: Fix garbled text in PDF/SVG vector graphics for final editing in AI. Detect, replace and repair garbled text in vector graphic files while maintaining original formatting and layout. version: 1.0.0 category: Visual tags:
- svg
- vector
- text-fix
- garbled-text
- document-repair
- encoding author: AIPOCH license: MIT status: Draft risk_level: Medium skill_type: Tool/Script owner: AIPOCH reviewer: '' last_updated: '2026-02-06'
Vector Text Fixer
Fixes garbled text in PDF/SVG vector graphics to make them editable in AI tools.
Features
- Garbled Text Detection: Automatically identifies garbled text in PDF/SVG files
- Smart Repair: Infers original text content based on context
- Batch Processing: Supports batch processing of multiple files in a folder
- Format Preservation: Repaired files maintain original vector format and layout
- AI-assisted Editing: Outputs intermediate format that can be imported into AI editors
Supported Scenarios
1. PDF Garbled Text Repair
- Box/question mark issues caused by font embedding problems
- Garbled text caused by encoding conversion errors
- Abnormal characters generated by missing font substitution
- Multi-language mixed encoding issues
2. SVG Garbled Text Repair
- Text entity encoding errors
- Special character escaping issues
- Display abnormalities caused by invalid font references
- XML encoding declaration errors
Usage
Command Line
# Fix a single PDF file
python scripts/main.py --input document.pdf --output fixed.pdf
# Fix a single SVG file
python scripts/main.py --input diagram.svg --output fixed.svg
# Batch process folder
python scripts/main.py --batch ./input_folder --output ./output_folder
# Interactive repair (manually specify replacement content)
python scripts/main.py --input doc.pdf --interactive
# Export as editable format (JSON)
python scripts/main.py --input doc.pdf --export-json editable.json
Python API
from scripts.main import VectorTextFixer
# Create fixer instance
fixer = VectorTextFixer()
# Fix PDF
result = fixer.fix_pdf("input.pdf", "output.pdf")
# Fix SVG
result = fixer.fix_svg("input.svg", "output.svg")
# Batch processing
results = fixer.batch_fix("./input_folder", "./output_folder")
# Get text map (for AI editing)
text_map = fixer.extract_text_map("input.pdf")
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
--input |
str | Yes* | Input file path (PDF or SVG) |
--batch |
str | No | Batch processing input folder |
--output |
str | Yes* | Output file/folder path |
--interactive |
bool | No | Enable interactive repair mode |
--export-json |
str | No | Export editable JSON format |
--encoding |
str | No | Specify source file encoding (default: auto-detect) |
--font-substitution |
dict | No | Font replacement mapping |
--repair-level |
str | No | Repair level: minimal, standard, aggressive (default: standard) |
*At least one of --input and --batch is required
Output Format
Repaired PDF/SVG
- Maintains original vector format
- Garbled text replaced with readable content
- Fonts and layout remain unchanged
JSON Export Format
{
"file_type": "pdf",
"pages": [
{
"page_num": 1,
"text_blocks": [
{
"id": "tb_001",
"bbox": [100, 200, 300, 220],
"original_text": "�����",
"detected_encoding": "UTF-8",
"confidence": 0.3,
"suggested_fix": "Sample Text"
}
]
}
],
"fonts_used": ["Arial", "SimSun"],
"repair_summary": {
"total_blocks": 15,
"fixed_blocks": 12,
"skipped_blocks": 3
}
}
Garbled Text Detection Rules
The tool uses the following rules to detect garbled text:
- Replacement Character Detection: Identifies U+FFFD (�) and box characters
- Control Character Filtering: Excludes non-printing control characters
- Encoding Consistency: Detects anomalies caused by mixed encodings
- Font Fallback Detection: Identifies substitution characters generated due to missing fonts
- Probability Model: Garbled text probability assessment based on character frequency
Repair Strategies
Minimal
- Only repairs obvious errors (replacement characters, null bytes)
- Maintains maximum integrity of original text
- Suitable for minor garbled text issues
Standard
- Repairs common encoding issues
- Smart font replacement
- Balances repair rate and accuracy
Aggressive
- Comprehensive text re-encoding
- Uses OCR-assisted recognition
- Suitable for severely garbled documents
Examples
Fix Single Page PDF
Input:
python scripts/main.py --input report.pdf --output fixed_report.pdf
Output:
✓ Processing: report.pdf
✓ Detected 5 garbled text blocks
✓ Fixed 4 blocks automatically
⚠ 1 block requires manual review
✓ Output saved: fixed_report.pdf
✓ Report saved: fixed_report_repair_log.json
Export Editable JSON
Input:
python scripts/main.py --input diagram.svg --export-json editable.json
Output JSON Structure:
{
"file_type": "svg",
"svg_info": {
"width": 800,
"height": 600,
"viewBox": "0 0 800 600"
},
"text_elements": [
{
"id": "text_1",
"x": 100,
"y": 200,
"font_family": "Arial",
"font_size": 14,
"original": "�����",
"user_editable": "",
"confidence": 0.25
}
]
}
Dependencies
pdfplumber>=0.10.0 # PDF parsing
PyMuPDF>=1.23.0 # PDF processing (fitz)
cairosvg>=2.7.0 # SVG conversion
beautifulsoup4>=4.12.0 # SVG parsing
fonttools>=4.40.0 # Font processing
chardet>=5.0.0 # Encoding detection
Pillow>=10.0.0 # Image processing
Limitations
- Encrypted PDFs require password unlock before processing
- Severely damaged vector files may not be fully repairable
- Some rare fonts may not map correctly
- Scanned PDFs require OCR recognition first
Version Information
- Version: 1.0.0
- Last Updated: 2026-02-06
- Status: Ready for use
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- No hardcoded credentials or API keys
- No unauthorized file system access (../)
- Output does not expose sensitive information
- Prompt injection protections in place
- Input file paths validated (no ../ traversal)
- Output directory restricted to workspace
- Script execution in sandboxed environment
- Error messages sanitized (no stack traces exposed)
- Dependencies audited
Prerequisites
# Python dependencies
pip install -r requirements.txt
Evaluation Criteria
Success Metrics
- Successfully executes main functionality
- Output meets quality standards
- Handles edge cases gracefully
- Performance is acceptable
Test Cases
- Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!