harmonizer#
Recipe-driven harmonization: pipeline architecture where each step is a
standalone function that reads from and writes to a shared HarmonizeState.
The recipe’s pipeline section declares which steps to run and with what
parameters, making the process composable and entity-type-agnostic.
Submodules#
Classes#
Source geometry classification (Lochhead et al. 2026, §2.3). |
|
Mutable container passed through every pipeline step. |
|
Recipe-driven harmonization via a composable step pipeline. |
Functions#
|
Instantiate and run harmonization for recipe. |
Package Contents#
- class openplaces.io.harmonizer.SourceGeometryType#
Bases:
enum.StrEnumSource geometry classification (Lochhead et al. 2026, §2.3).
A footprint is the 2-D ground-extent polygon of one or more structures. A building point represents a single insurable structure (e.g., NSI). A dwelling point represents a single unit within a larger structure.
- class openplaces.io.harmonizer.HarmonizeState#
Mutable container passed through every pipeline step.
Fields are populated progressively as steps run.
- Parameters:
recipe (dict) – The loaded harmonization recipe driving this run.
admin_id (AdminId or None) – Current admin unit being processed, or
Nonefor global runs.verbose (bool) – Print per-step progress messages.
timer (object or None) – Timing helper (from
get_timer()).spine (GeoDataFrame or None) – The primary entity GeoDataFrame being built (e.g., footprints).
references (dict[str, GeoDataFrame]) – Reference datasets keyed by resolved
recipe_id.crosswalks (dict[str, GeoDataFrame]) – Tabular spine ↔ reference join tables, keyed by resolved recipe_id.
overlays (dict[str, GeoDataFrame]) – Geometry-bearing overlay results keyed by resolved recipe_id. Populated by
link_to_referencefor spatial_overlay joins and made available to subsequent steps (e.g.,link_to_referencefor spatial_point that needs footprint-parcel geometries).reference_types (dict[str, str]) – Maps resolved recipe_id → entity_type (e.g.
{'US-MA_parcel-mapc-2024': 'parcel'}). Lets steps look up all crosswalks of a given entity type without knowing the exact recipe_id.source_geometry_types (dict[str, SourceGeometryType]) – Maps resolved recipe_id →
SourceGeometryType. Populated bylink_to_referencewhensource_geometry_typeis declared in the recipe step. Used byclassify_footprint_roleto identify which linked datasets are evidence of primary buildings.simplified_geometry (GeoSeries or None) – Set by
simplify_geometries; written as a sidecar by the save step.metadata (dict) – Arbitrary step-specific intermediate data (e.g. discovered admin sources, parcel-inferred footprint DataFrames).
- get_crosswalks_by_type(entity_type: str) dict[str, geopandas.GeoDataFrame]#
Return all crosswalks whose reference matches
entity_type.
- get_references_by_type(entity_type: str) dict[str, geopandas.GeoDataFrame]#
Return all reference GeoDataFrames matching
entity_type.
- class openplaces.io.harmonizer.Harmonizer(recipe: str | dict, admin_ids: str | list | None = None, verbose: bool = False)#
Recipe-driven harmonization via a composable step pipeline.
Reads the
pipelinelist from the recipe and executes each declared step in order, passing a sharedHarmonizeStatebetween them. All configuration (thresholds, sources, enrichment) lives in the recipe — there are no entity-type-specific subclasses.For recipes with
process_by.admin_level > 0, harmonization runs once per admin unit. For recipes withprocess_by.admin_level == 0(e.g., global admin geometry), harmonization runs once globally.- Parameters:
recipe (str or dict) – Harmonization recipe ID string or pre-loaded dict.
admin_ids (str, list, or None) – Admin IDs to harmonize. IDs coarser than
process_by.admin_levelare automatically expanded to matching children.Noneprocesses all children of the recipe’sadmin_idat the process level.verbose (bool) – Print progress messages.
- harmonize(reprocess: bool = False) None#
Run harmonization for all configured admin IDs.
- Parameters:
reprocess (bool) – If
False(default), skip admin IDs whose output file already exists.
- openplaces.io.harmonizer.harmonize(recipe: str | dict, admin_ids: str | list | None = None, reprocess: bool = False, verbose: bool = False) None#
Instantiate and run harmonization for recipe.
Convenience wrapper around
Harmonizer(recipe, ...).harmonize().- Parameters:
recipe (str or dict) – Recipe ID string or loaded recipe dict.
admin_ids (str, list, or None) – Admin IDs to process.
reprocess (bool) – If True, re-run even if output already exists.
verbose (bool) – Print progress messages.