Pipeline stages#
Notebooks for data processing and analysis (notebooks) are grouped into task types:
Stage |
Description |
|---|---|
|
Define folder structure, set preferences, lock administrative referencing |
|
Download, unzip, and stage data in cache |
|
Align entity datasets from multiple sources, create spine of entities |
|
Derive attributes for entities using geoprocessing and record linkage. |
|
Create analysis-ready dataset (select, aggregate, snapshot). |
|
Fit models, cross-validate, derive standard errors, optimize. |
|
Create artifacts from models: predictions, aggregated coefficients |
|
Creation of publication-ready figures, tables, and text |
|
Interactive display of results, demonstrating quality, issues, or functionality |
These notebooks can be converted to scripts and be orchestrated across large computing clusters (to process data entire world regions).