Desktop Automation Skill

Desktop automation: mouse, keyboard, windows control, OCR, image recognition — all local.

🎯 About

OpenClaw skill to automate desktop interactions on Windows/macOS/Linux using Python. Uses PyAutoGUI for basic actions, OpenCV for image recognition, and pytesseract for OCR.

Perfect for:

Automating legacy apps without an API
Filling repetitive forms
Launching GUI workflows from commands
Recording and replaying macros

🔐 Security & Privacy

⚠️ Keyboard Recording Warning

The macro recorder captures ALL keyboard events, including:

Passwords
Credit card numbers
Personal identification numbers
Private messages

Never record macros while entering sensitive credentials. Only use macro recording for non-sensitive workflows. Recorded macro files store raw keystrokes — store them securely.

Data Handling

No network access: All actions execute locally. No data is sent to external servers.
No credential storage: The skill does not store or transmit passwords unless you explicitly record them in a macro.
Local execution: Dependencies are standard Python packages (PyAutoGUI, OpenCV, etc.) — no external API calls.

Best Practices

Verify coordinates before clicking on sensitive windows.
Test macros in dry-run or slow speed first.
Use activate_window with precise titles to avoid misclicks.
Restrict file permissions on recorded macro JSON files (they may contain sensitive input).

📦 Installation

Prerequisites

Python 3.10+
Pip

Install dependencies

pip install -r requirements.txt

Enable the skill

The folder desktop-automation-100per100-local must be in C:\Users\Noham iA\.openclaw\workspace\skills\. Restart OpenClaw:

openclaw gateway restart

🚀 Available Actions

Basic

Action	Parameters	Description
`click`	`x`, `y`, `button?` ("left"/"right"/"middle")	Click at coordinates
`type`	`text`, `interval?` (float)	Type with optional interval
`screenshot`	`path?` (default: `~/Desktop/screenshot.png`)	Capture screen
`get_active_window`	—	Returns `{title, x, y, width, height}` of active window
`list_windows`	—	List all windows (`[{title, x, y, width, height, is_active}]`)
`activate_window`	`title_substring`	Activate first matching window
`move_mouse`	`x`, `y`	Move cursor
`press_key`	`key` (e.g. 'enter', 'tab', 'escape', 'space')	Press a single key
`scroll`	`amount` (positive=up, negative=down)	Scroll
`copy_to_clipboard`	`text`	Copy to clipboard (requires `pyperclip`)
`paste_from_clipboard`	—	Paste (Ctrl+V)
`drag`	`start_x`, `start_y`, `end_x`, `end_y`, `duration?`, `button?`	Drag-and-drop

Advanced

Action	Parameters	Description
`find_image`	`template_path`, `confidence?` (0.0-1.0)	Find image on screen (OpenCV). Returns `{x, y, confidence}` or `not_found`
`wait_for_image`	`template_path`, `timeout?`, `interval?`, `confidence?`	Wait for an image to appear
`find_text_on_screen`	`text`, `lang?` ('fra' default)	OCR: search text on screen (requires Tesseract)

📝 Usage Examples

1. Click and type

sessions_spawn({ task: 'click {"x":100,"y":200}', label: 'desktop-automation-100per100-local' });
sessions_spawn({ task: 'type {"text":"Hello World"}', label: 'desktop-automation-100per100-local' });

2. Screenshot and list windows

sessions_spawn({ task: 'list_windows', label: 'desktop-automation-100per100-local' });
sessions_spawn({ task: 'screenshot {"path":"~/Desktop/my_screen.png"}', label: 'desktop-automation-100per100-local' });

3. Activate window

sessions_spawn({ task: 'activate_window {"title_substring":"Notepad"}', label: 'desktop-automation-100per100-local' });

4. Image search (with OpenCV)

sessions_spawn({
  task: 'find_image {"template_path":"C:/path/button.png","confidence":0.9}',
  label: 'desktop-automation-100per100-local'
});

5. Full macro (record/play)

// Start GUI recording
sessions_spawn({ task: 'record_macro', label: 'desktop-automation-100per100-local' });
// Stop → file generated in recorded_macro/
// Replay
sessions_spawn({
  task: 'play_macro {"macro_path":"C:/.../macro_2026-03-14_22-00-00.json"}',
  label: 'desktop-automation-100per100-local'
});

⚙️ Macro Recording

The skill includes a Tkinter GUI to record macros:

Launch: python scripts/record_macro.py
Records: mouse, clicks, scrolling, keyboard, window changes
Saves JSON in recorded_macro/ (timestamped name)
Playback: python scripts/play_macro.py <file.json> [speed]

CLI example:

python scripts/record_macro.py
python scripts/play_macro.py recorded_macro/macro_2026-03-14_22-00-00.json 1.0

🔧 Configuration

OCR (pytesseract)

Install Tesseract on your system: https://github.com/tesseract-ocr/tesseract
On Windows: add to PATH or set pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
Language packs: fra, eng, etc.

Pyperclip (copy/paste)

On Linux: xclip or xsel required
On Windows/macOS: usually works out of the box

⚠️ Safety & Best Practices

Full control: the skill controls mouse and keyboard — use carefully
No network access: isolated, no external data
Coordinates: verify before clicking (use screenshot for reference)
Windows: activate_window assumes a unique title — be precise
Macros: always test in dry-run or slow speed first
Permissions: on Windows, may require admin rights for some apps

🐛 Troubleshooting

Problem	Solution
`pyautogui.FailSafeException`	Move mouse to corner (0,0) to disable failsafe, or `pyautogui.FAILSAFE = False`
OCR finds nothing	Check Tesseract installation + language. `pytesseract.get_tesseract_version()`
`find_image` fails	Ensure template matches exactly (same scale/color). Adjust `confidence` (0.7-0.95)
`activate_window` not found	Use `list_windows` to see exact titles
ImportError (missing module)	`pip install -r requirements.txt` (check Python 3.10+)
Tkinter GUI won't open	On Linux: `apt-get install python3-tk`. On Windows: usually present

🤝 Contributing

This skill is open-source under the MIT license. Contributions are welcome!

How to contribute

Fork the repository
Create a branch (git checkout -b feature/improvement)
Commit (git commit -am 'Add X')
Push (git push origin feature/improvement)
Open a Pull Request

Guidelines

Follow existing code style (PEP 8, 4 spaces)
Add docstrings in English
Test locally
Update README.md if needed

See CONTRIBUTING.md for more details.

📄 License

MIT License — see LICENSE file.

🙏 Authors

Original creator: Jordane Guemara — https://github.com/JordaneParis

See AUTHORS.md for full contributors list.

🔗 Links

Version: 2.0.0 — ultra-robust, thread-safe, with Tkinter logging

desktop-automation-100per100-local

Description