Zotero Vectorize
Build and maintain a cross-platform local Zotero semantic index using metadata embeddings and PDF full-text chunk embeddings. Use when the user asks to vecto...
Description
name: zotero-vectorize description: Build and maintain a cross-platform local Zotero semantic index using metadata embeddings and PDF full-text chunk embeddings. Use when the user asks to vectorize a Zotero library, create or refresh metadata_vectors.json or fulltext_vectors.json, check for new Zotero items missing from the vector store, incrementally update a Zotero semantic/RAG index, verify vector store counts and sizes, or reproduce this workflow on Windows, macOS, or Linux.
Zotero Vectorize
Build and maintain a local-first, cross-platform Zotero vector store for semantic search and RAG over bibliographic metadata and PDF full text.
Keep SKILL.md focused on workflow. Read the reference files only when needed:
references/config.md— paths, environment variables, output layoutreferences/data-format.md— JSON schemas and file namingreferences/windows.md/macos.md/linux.md— platform-specific path defaults and notesreferences/troubleshooting.md— common failures and recovery
Core rules
- Treat Zotero as read-only input. Never modify the user’s Zotero database or attachment storage.
- Prefer creating a database snapshot before reading.
- For incremental updates: check first, report missing items, wait for user confirmation, then apply.
- Before any update that rewrites store files: back up first, then write.
- Backup retention for this skill is fixed: keep only the latest and previous backup per file.
- Default output filenames are:
metadata_vectors.jsonfulltext_vectors.jsonvector_store_metadata.json
Workflow decision tree
1) Detect or confirm paths
If the Zotero data directory, database path, or storage path is unknown:
- Read
references/config.md - Read the platform-specific reference (
windows.md,macos.md, orlinux.md) - Run:
python scripts/detect_zotero_paths.py
If the detected paths are wrong, ask the user to open Zotero and use Show Data Directory, then rerun with explicit --data-dir, --db, or --storage-dir.
2) Create a database snapshot
Before full builds or incremental checks, snapshot the Zotero database:
python scripts/snapshot_zotero_db.py --output-dir <store-dir>
If snapshotting fails because SQLite is locked, ask the user to close Zotero and retry.
3) Build the metadata vector store
Use this when the user asks to create or rebuild metadata embeddings for the Zotero library.
python scripts/build_metadata_vectors.py --output-dir <store-dir>
This writes metadata_vectors.json and refreshes vector_store_metadata.json + README.md.
4) Build the full-text vector store
Use this when the user asks to create or rebuild PDF full-text embeddings.
python scripts/build_fulltext_vectors.py --output-dir <store-dir>
This scans Zotero PDF attachments, extracts text, chunks it, embeds each chunk, and writes fulltext_vectors.json.
5) Check incremental updates
Use this when the user asks whether Zotero contains new items not yet added to the vector store.
python scripts/check_incremental_updates.py --output-dir <store-dir>
Report:
- total top-level Zotero items
- total PDF-parent items
- current metadata/fulltext vector counts
- missing metadata items
- missing fulltext items
Do not update the store yet.
6) Apply incremental updates
Only run this after the user confirms the update.
python scripts/apply_incremental_updates.py --output-dir <store-dir>
This script:
- snapshots the DB
- backs up store files
- appends missing metadata/fulltext entries
- keeps only the latest and previous backup per file
- updates store metadata and README
Use --item-id to limit the update to specific items if the user wants a partial apply.
7) Verify the finished store
After any build or incremental update, verify counts and sizes:
python scripts/verify_vector_store.py --output-dir <store-dir>
Always report:
- metadata item count
- fulltext item count
- fulltext chunk count
- metadata file size
- fulltext file size
Scripts
scripts/detect_zotero_paths.py— resolve default/current Zotero pathsscripts/snapshot_zotero_db.py— create a safe SQLite snapshotscripts/build_metadata_vectors.py— full rebuild of metadata vectorsscripts/build_fulltext_vectors.py— full rebuild of PDF full-text vectorsscripts/check_incremental_updates.py— compare Zotero against current vector storescripts/apply_incremental_updates.py— append missing items after user confirmationscripts/backup_with_retention.py— back up store files and retain only the latest two statesscripts/verify_vector_store.py— report counts, sizes, and store metadata
Output expectations
When using this skill successfully, return concise operational summaries such as:
- detected paths
- snapshot path used
- number of items/chunks written
- current file sizes
- whether any items are missing
- which itemIDs were appended during incremental update
Escalation notes
Read references/troubleshooting.md when:
- SQLite snapshot fails
- HuggingFace/model download or local model loading fails
- PDFs are missing or unreadable
- full-text extraction is incomplete
- file paths differ from defaults on the current OS
Reviews (0)
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!