🧪 Skills
Office To Md V2
Convert PDF, DOC, DOCX, and PPTX office documents to Markdown, supporting legacy .doc files with text extraction and basic formatting preservation.
v0.1.0
Description
Office to Markdown Converter Skill (v2)
Description
Convert office documents (PDF, DOC, DOCX, PPTX) to Markdown format. This skill uses the word-extractor library for .doc support and provides full OpenClaw integration.
When to Use
- When you need to extract text from office documents
- When you want to convert documents to readable Markdown format
- When analyzing document content in OpenClaw
- Specifically when dealing with legacy .doc format files
Supported Formats
- PDF (.pdf): Text extraction using pdf-parse
- Word (.docx): Formatting preservation using mammoth + turndown
- Legacy Word (.doc): Text extraction using word-extractor (supports Chinese encoding)
- PowerPoint (.pptx): Basic text extraction using python-pptx
Dependencies
- Node.js with npm packages: pdf-parse, mammoth, turndown, word-extractor
- Python3 with python-pptx (for PPTX conversion, optional)
- OpenClaw exec tool permission
Installation
1. Copy the skill to your workspace:
cp -r /root/.openclaw/workspace/office-to-md-v2/office-to-md /path/to/your/workspace/
2. Install dependencies:
cd /path/to/your/workspace/office-to-md
npm install
3. For PPTX support (optional):
pip3 install python-pptx
Usage in OpenClaw
Method 1: Direct exec call
// Convert any supported document
const result = await exec(
'node /path/to/office-to-md/openclaw-skill.js /path/to/document.doc',
{ workdir: '/path/to/workspace', timeout: 60000 }
);
if (result.exitCode === 0) {
console.log('✅ Document converted successfully');
// Output file: /path/to/document.md
} else {
console.error('❌ Conversion failed:', result.stderr);
}
Method 2: Using the wrapper function
// Import the converter
const { convertOfficeToMarkdown } = require('/path/to/office-to-md/openclaw-skill.js');
// Convert document
const conversionResult = await convertOfficeToMarkdown('/path/to/document.pdf');
if (conversionResult.success) {
console.log(`Output: ${conversionResult.outputPath}`);
console.log(`Preview: ${conversionResult.preview}`);
} else {
console.error(`Error: ${conversionResult.error}`);
}
Method 3: Complete OpenClaw integration function
async function convertDocumentToMarkdown(filePath) {
// Validate file exists
try {
await read(filePath);
} catch (error) {
return { success: false, error: `File not found: ${filePath}` };
}
// Check file extension
const ext = filePath.toLowerCase().slice(-5);
const supported = ['.pdf', '.doc', '.docx', '.pptx'];
if (!supported.some(s => ext.endsWith(s))) {
return {
success: false,
error: `Unsupported file type. Supported: ${supported.join(', ')}`
};
}
// Convert using the skill
const cmd = `node /path/to/office-to-md/openclaw-skill.js "${filePath}"`;
const result = await exec(cmd, {
workdir: '/path/to/workspace',
timeout: 120000 // 2 minutes for large files
});
if (result.exitCode === 0) {
const outputPath = filePath.replace(/\.[^/.]+$/, '.md');
return {
success: true,
outputPath: outputPath,
message: `Converted to: ${outputPath}`
};
} else {
return {
success: false,
error: result.stderr || 'Conversion failed'
};
}
}
// Usage example
const result = await convertDocumentToMarkdown('/path/to/document.doc');
if (result.success) {
const markdown = await read(result.outputPath);
console.log(markdown.substring(0, 1000));
}
Examples
Example 1: Convert and analyze a document
// Convert a .doc file and analyze its content
const docPath = '/path/to/document.doc';
const convertResult = await exec(
`node /path/to/office-to-md/openclaw-skill.js "${docPath}"`,
{ workdir: '/path/to/workspace' }
);
if (convertResult.exitCode === 0) {
const mdPath = docPath.replace('.doc', '.md');
const content = await read(mdPath);
// Analyze the content
const wordCount = content.split(/\s+/).length;
const lines = content.split('\n').length;
const hasChinese = /[\u4e00-\u9fff]/.test(content);
console.log(`Document analysis:`);
console.log(`- Word count: ${wordCount}`);
console.log(`- Lines: ${lines}`);
console.log(`- Contains Chinese: ${hasChinese}`);
console.log(`- Preview: ${content.substring(0, 200)}...`);
}
Example 2: Batch conversion
// Convert multiple documents of different formats
const documents = [
'/path/to/report.pdf',
'/path/to/legacy.doc',
'/path/to/modern.docx',
'/path/to/presentation.pptx'
];
const results = [];
for (const doc of documents) {
console.log(`Converting ${doc}...`);
const result = await exec(
`node /path/to/office-to-md/openclaw-skill.js "${doc}"`,
{ workdir: '/path/to/workspace', timeout: 90000 }
);
const success = result.exitCode === 0;
results.push({
file: doc,
success: success,
error: success ? null : result.stderr
});
console.log(success ? '✅ Success' : '❌ Failed');
}
// Summary
const successful = results.filter(r => r.success).length;
console.log(`\nConversion summary: ${successful}/${results.length} successful`);
API Reference
convertOfficeToMarkdown(filePath)
Returns a Promise that resolves to:
{
success: boolean,
outputPath?: string,
markdown?: string,
preview?: string,
fileType?: string,
message?: string,
stats?: {
lines: number,
characters: number,
words: number
},
error?: string,
stack?: string
}
Configuration
Timeout Settings
- Small files (<1MB): 30 seconds
- Medium files (1-10MB): 60 seconds
- Large files (>10MB): 120 seconds
Memory Limits
- Default Node.js memory limit is sufficient for most documents
- For very large files, you may need to increase memory:
node --max-old-space-size=4096 openclaw-skill.js large-file.doc
Troubleshooting
Common Issues
-
"File not found"
- Check file path and permissions
- Use absolute paths for reliability
-
"Unsupported file type"
- Ensure file has correct extension
- Check if file is actually the claimed format
-
Conversion errors with .doc files
- The file may be corrupted or in an unusual format
- Try opening in Word and saving as .docx first
-
Chinese text appears as gibberish
- word-extractor should handle Chinese encoding automatically
- If issues persist, the file may use unusual encoding
-
Timeout errors
- Increase timeout for large files
- Check system resources
Debug Mode
Enable debug logging by setting environment variable:
DEBUG=office-to-md node openclaw-skill.js document.doc
Performance
- PDF: Fast, depends on file size
- DOCX: Fast to medium, good formatting preservation
- DOC: Medium, requires binary parsing
- PPTX: Slow, requires Python and external library
Limitations
- Images in documents are not extracted
- Complex formatting may not be fully preserved
- Tables may convert imperfectly to Markdown
- Very old or corrupted .doc files may fail
- Password-protected files are not supported
Changelog
v2.0.0 (2026-02-15)
- Added full .doc support using word-extractor
- Fixed ESM compatibility issues with pptConverter
- Added comprehensive OpenClaw integration
- Improved Chinese text extraction
- Added structured output with statistics
v1.0.0 (Initial)
- Basic PDF, DOCX, PPTX support
- Simple conversion without .doc support
License
This skill is provided as-is. The underlying libraries have their own licenses:
- pdf-parse: MIT
- mammoth: BSD-2-Clause
- turndown: MIT
- word-extractor: MIT
- python-pptx: MIT
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!