GPU Keepalive with KeepGPU
Install and operate KeepGPU for GPU keep-alive with both blocking CLI and non-blocking service workflows. Use when users ask for keep-gpu command constructio...
Description
name: gpu-keepalive-with-keepgpu description: Install and operate KeepGPU for GPU keep-alive with both blocking CLI and non-blocking service workflows. Use when users ask for keep-gpu command construction, start/status/stop session control, dashboard usage, tuning (--vram, --interval, --busy-threshold), installation from this repository, or troubleshooting keep sessions; do not use for repository development, code refactoring, or unrelated Python tooling.
KeepGPU CLI Operator
Use this workflow to run keep-gpu safely and effectively.
Prerequisites
- Confirm at least one GPU is visible (
python -c "import torch; print(torch.cuda.device_count())"). - Run commands in a shell where CUDA/ROCm drivers are already available.
- Use
Ctrl+Cto stop KeepGPU and release memory cleanly.
Install KeepGPU
Install PyTorch first for your platform, then install KeepGPU.
Option A: Install from package index
# CUDA example (change cu121 to your CUDA version)
pip install --index-url https://download.pytorch.org/whl/cu121 torch
pip install keep-gpu
# ROCm example (change rocm6.1 to your ROCm version)
pip install --index-url https://download.pytorch.org/whl/rocm6.1 torch
pip install keep-gpu[rocm]
Option B: Install directly from Git URL (no local clone)
Prefer this option when users only need the CLI and do not need local source edits. This avoids checkout directory and cleanup overhead.
pip install "git+https://github.com/Wangmerlyn/KeepGPU.git"
If SSH access is configured:
pip install "git+ssh://git@github.com/Wangmerlyn/KeepGPU.git"
ROCm variant from Git URL:
pip install "keep_gpu[rocm] @ git+https://github.com/Wangmerlyn/KeepGPU.git"
Option C: Install from a local source checkout (explicit path)
Use this option only when users already have a local checkout or plan to edit source.
git clone https://github.com/Wangmerlyn/KeepGPU.git
cd KeepGPU
pip install -e .
If the checkout already exists somewhere else, install by absolute path:
pip install -e /absolute/path/to/KeepGPU
For ROCm users from local checkout:
pip install -e ".[rocm]"
Verify installation:
keep-gpu --help
Command model
KeepGPU supports two execution modes.
Blocking mode (compatibility)
keep-gpu --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
Use when users intentionally want one foreground process and manual Ctrl+C stop.
Non-blocking mode (recommended for agents)
keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
keep-gpu status
keep-gpu stop --all
keep-gpu service-stop
start auto-starts local service when unavailable.
Ctrl+C stops only foreground blocking runs. For service mode sessions started by keep-gpu start, use keep-gpu status, keep-gpu stop, and keep-gpu service-stop.
CLI options to tune:
--gpu-ids: comma-separated IDs (0,0,1). If omitted, KeepGPU uses all visible GPUs.--vram: VRAM to hold per GPU (512MB,1GiB, or raw bytes).--interval: seconds between keep-alive cycles.--busy-threshold(--util-thresholdalias): if utilization is above this percent, KeepGPU backs off.
Legacy compatibility:
--thresholdis deprecated but still accepted.- Numeric
--thresholdmaps to busy threshold. - String
--thresholdmaps to VRAM.
Agent workflow
- Collect workload intent: target GPUs, hold duration, and whether node is shared.
- Choose mode:
- blocking mode for manual shell sessions,
- non-blocking mode for agent pipelines (default recommendation).
- Choose safe defaults when unspecified:
--vram 1GiB,--interval 60-120,--busy-threshold 25. - Provide command sequence with verification and stop command.
- For non-blocking mode, include
status,stop, and daemon shutdown (service-stop).
Command templates
Single GPU while preprocessing (blocking):
keep-gpu --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
All visible GPUs with lighter load (blocking):
keep-gpu --vram 512MB --interval 180
Agent-friendly non-blocking sequence:
keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25
keep-gpu status
keep-gpu stop --job-id <job_id>
keep-gpu service-stop
Open dashboard:
http://127.0.0.1:8765/
Remote sessions (preferred: tmux for visibility and control):
tmux new -s keepgpu
keep-gpu --gpu-ids 0 --vram 1GiB --interval 300
# Detach with Ctrl+b then d; reattach with: tmux attach -t keepgpu
Fallback when tmux is unavailable:
nohup keep-gpu --gpu-ids 0 --vram 1GiB --interval 300 > keepgpu.log 2>&1 &
echo $! > keepgpu.pid
# Monitor: tail -f keepgpu.log
# Stop: kill "$(cat keepgpu.pid)"
Troubleshooting
- Invalid
--gpu-ids: ensure comma-separated integers only. - Allocation failure / OOM: reduce
--vramor free memory first. - No utilization telemetry: ensure
nvidia-ml-pyworks andnvidia-smiis available. - No GPUs detected: verify drivers, CUDA/ROCm runtime, and
torch.cuda.device_count().
Example
User request: "Install KeepGPU from GitHub and keep GPU 0 alive while I preprocess."
Suggested response shape:
- Install:
pip install "git+https://github.com/Wangmerlyn/KeepGPU.git" - Run:
keep-gpu start --gpu-ids 0 --vram 1GiB --interval 60 --busy-threshold 25 - Verify:
keep-gpu statusor dashboardhttp://127.0.0.1:8765/; stop session withkeep-gpu stop --job-id <job_id>and daemon withkeep-gpu service-stop.
Limitations
- KeepGPU is not a scheduler; it only keeps already accessible GPUs active.
- KeepGPU behavior depends on cluster policy; some schedulers require higher VRAM or tighter intervals.
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!