🧪 Skills
Clinical Data Cleaner
Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detec...
v0.1.1
Description
name: clinical-data-cleaner description: Use when cleaning clinical trial data, preparing data for FDA/EMA submission, standardizing SDTM datasets, handling missing values in clinical studies, detecting outliers in lab results, or converting raw CRF data to CDISC format. Cleans and standardizes clinical trial data for regulatory compliance with audit trails. allowed-tools: "Read Write Bash Edit" license: MIT metadata: skill-author: AIPOCH version: "2.0"
Clinical Data Cleaner
Clean, validate, and standardize clinical trial data to meet CDISC SDTM standards for regulatory submissions to FDA or EMA.
Quick Start
from scripts.main import ClinicalDataCleaner
# Initialize for Demographics domain
cleaner = ClinicalDataCleaner(domain='DM')
# Clean data with default settings
cleaned = cleaner.clean(raw_data)
# Save with audit trail
cleaner.save_report('output.csv')
Core Capabilities
1. SDTM Domain Validation
cleaner = ClinicalDataCleaner(domain='DM') # or 'LB', 'VS'
is_valid, missing = cleaner.validate_domain(data)
Required Fields:
- DM: STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, SITEID, AGE, SEX, RACE
- LB: STUDYID, USUBJID, LBTESTCD, LBCAT, LBORRES, LBORRESU, LBSTRESC, LBDTC
- VS: STUDYID, USUBJID, VSTESTCD, VSORRES, VSORRESU, VSSTRESC, VSDTC
2. Missing Value Handling
cleaner = ClinicalDataCleaner(
domain='DM',
missing_strategy='median' # mean, median, mode, forward, drop
)
cleaned = cleaner.handle_missing_values(data)
3. Outlier Detection
cleaner = ClinicalDataCleaner(
domain='LB',
outlier_method='domain', # iqr, zscore, domain
outlier_action='flag' # flag, remove, cap
)
flagged = cleaner.detect_outliers(data)
Clinical Thresholds:
| Parameter | Range | Unit |
|---|---|---|
| Glucose | 50-500 | mg/dL |
| Hemoglobin | 5-20 | g/dL |
| Systolic BP | 70-220 | mmHg |
4. Date Standardization
standardized = cleaner.standardize_dates(data)
# Converts to ISO 8601: 2023-01-15T09:30:00
5. Complete Pipeline
cleaner = ClinicalDataCleaner(
domain='DM',
missing_strategy='median',
outlier_method='iqr',
outlier_action='flag'
)
cleaned_data = cleaner.clean(data)
cleaner.save_report('output.csv')
Output Files:
output.csv- Cleaned SDTM dataoutput.report.json- Audit trail for regulatory submission
CLI Usage
# Clean demographics
python scripts/main.py \
--input dm_raw.csv \
--domain DM \
--output dm_clean.csv \
--missing-strategy median \
--outlier-method iqr \
--outlier-action flag
# Clean lab data with clinical thresholds
python scripts/main.py \
--input lb_raw.csv \
--domain LB \
--output lb_clean.csv \
--outlier-method domain
Common Patterns
See references/common-patterns.md for detailed examples:
- Regulatory Submission Preparation
- Interim Analysis Data Preparation
- Database Migration Cleanup
- External Lab Data Integration
Troubleshooting
See references/troubleshooting.md for solutions to:
- Validation failures
- Date parsing errors
- Memory errors with large datasets
- Outlier detection issues
Quality Checklist
Pre-Cleaning:
- IACUC approval obtained (animal studies)
- Sample size adequately powered
- Randomization method documented
Post-Cleaning:
- Validate against CDISC SDTM IG
- Review all cleaning actions in audit trail
- Test import to analysis software
References
references/sdtm_ig_guide.md- CDISC SDTM Implementation Guidereferences/domain_specs.json- Domain-specific field requirementsreferences/outlier_thresholds.json- Clinical outlier thresholdsreferences/common-patterns.md- Detailed usage patternsreferences/troubleshooting.md- Problem-solving guide
Skill ID: 189 | Version: 2.0 | License: MIT
Reviews (0)
Sign in to write a review.
No reviews yet. Be the first to review!
Comments (0)
No comments yet. Be the first to share your thoughts!