GitLab Batch Cloner
Batch clone GitLab group projects with branch checkout and Excel indexing. Use when user needs to clone all projects from a GitLab group, organize by group s...
Description
name: pf-clone-gitlab description: Batch clone GitLab group projects with branch checkout and Excel indexing. Use when user needs to clone all projects from a GitLab group, organize by group structure, checkout multiple branches (master/default + latest + release/prod), and generate an Excel index file with project metadata and branch information.
Clone GitLab Skill
Batch clone all projects from GitLab group(s), maintain folder hierarchy, checkout key branches, and generate an Excel index.
When to Use
- User wants to batch clone all projects from one or more GitLab top-level groups
- Need to maintain original group/subgroup folder hierarchy locally
- Need to checkout multiple branches (default + latest active + release/prod)
- Need an Excel index file (
01.Index.xlsx) with per-group sheets
Requirements
Before starting, collect from user:
| Parameter | Required | Default | Notes |
|---|---|---|---|
| GitLab URL | ✅ | — | e.g. https://gitlab.company.com |
| Personal Access Token | ✅ | — | Needs read_api + read_repository scopes |
| Target Group(s) | ✅ | — | Group/sub-group/project paths, comma-separated. Supports: top-level group (myGroup), sub-group path (myGroup/mySubGroup), or direct project path (myGroup/mySubGroup/my-project) |
| Local Storage Path | ❌ | ~/Desktop/Code |
Where repos are stored |
| Auth Method | ❌ | HTTPS+Token | Or SSH if key is configured |
| Mode | ❌ | clone | clone (default), update (pull only), or sync (clone+pull+cleanup stale Excel rows) |
| Workers | ❌ | 4 | Parallel clone workers (GITLAB_WORKERS) |
| Total Timeout | ❌ | 0 (none) | Global timeout in seconds (GITLAB_TOTAL_TIMEOUT). 0=no limit |
Workflow
Step 1: Collect Input
Ask user for the 4 parameters above. Pass them as environment variables to the script:
GITLAB_URL, GITLAB_TOKEN, GITLAB_GROUPS (comma-separated), GITLAB_BASE_DIR
Step 2: Run the Script
cd <skill-dir>/scripts
python3 clone_and_index.py
The script handles everything:
- Resolves each input path — auto-detects if it's a top-level group, sub-group, or direct project
- For top-level groups: fetches all subgroups recursively (skips if access denied)
- For sub-group paths (e.g.
myGroup/mySubGroup): directly resolves and syncs that sub-group and its descendants only - For direct project paths: syncs only that specific project
- Clones new projects / pulls existing ones (with 5-min timeout per project)
- Uses multiprocessing for parallel clone/pull (process-group kill on timeout to prevent orphan git processes)
- Checks out branches: default + latest active + release/prod
- Incrementally updates
01.Index.xlsxevery 50 projects (so partial results survive crashes) - On SIGTERM/SIGINT, emergency-flushes pending results to Excel before exit
- Prints per-group project counts during discovery and real-time progress with elapsed time
- In
syncmode, the final Excel write removes rows for deleted/archived projects and handles cross-group migrations
Step 3: Report Results
After the script finishes, report:
- Total projects cloned/updated
- Any failed/timed-out projects (the script prints a summary table)
Modes
Update Only
If projects already exist and user just wants to update:
GITLAB_MODE=update python3 clone_and_index.py
This skips clone, only does git fetch --all && git pull on existing repos, re-checkouts branches, and refreshes the Excel.
Sync (Full Sync with Cleanup)
For a complete sync that also cleans up stale data in the Excel:
GITLAB_MODE=sync python3 clone_and_index.py
This behaves like clone mode (new repos are cloned, existing ones are pulled), but additionally:
- Removes Excel rows for projects that no longer exist on GitLab (deleted/archived)
- Handles projects that moved between groups (updates path, removes old row)
- Only cleans up sheets belonging to the groups specified in
GITLAB_GROUPS(won't touch other groups' data)
Excel Specification
File: 01.Index.xlsx inside <local-path>/ (e.g. ~/Desktop/Code/01.Index.xlsx)
Sheets: One sheet per top-level group (sheet name = group name, e.g. "myGroup")
Columns:
| Col | Field | Content |
|---|---|---|
| A | 主Group名称 | Top-level group (e.g. myGroup) |
| B | 子Group路径 | Full group path without project name |
| C | Project路径 | Full path (e.g. myGroup/mySubGroup/myProject) |
| D | Project名称 | Project name |
| E | Project描述 | GitLab description |
| F | 已checkout分支 | All local branches, one per line |
| G | 分支最新提交时间 | Corresponding commit times, one per line |
| H | SSH Git链接 | ssh_url_to_repo |
| I | 下载时间 | Clone/update timestamp |
| J | Project ID | GitLab project ID (hidden column, used for matching) |
Sort: A (asc) → B (asc) → D (asc)
Formatting: Frozen header row, thin borders on all cells, F/G columns left-aligned with wrap, other columns center-aligned, UTF-8 encoding.
Security
- Token is passed via environment variable, never logged
- After clone, remote URL is reset to remove embedded token
- If clone times out or crashes, a cleanup step removes token from
.git/config
Output Structure
~/Desktop/Code/
├── 01.Index.xlsx
├── myGroup/
│ ├── SubGroup1/
│ │ ├── project-a/
│ │ └── project-b/
│ └── SubGroup2/
│ └── project-c/
└── AnotherGroup/
└── ...
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!