harmonizer#

Recipe-driven harmonization: pipeline architecture where each step is a standalone function that reads from and writes to a shared HarmonizeState. The recipe’s pipeline section declares which steps to run and with what parameters, making the process composable and entity-type-agnostic.

Submodules#

Classes#

SourceGeometryType

Source geometry classification (Lochhead et al. 2026, §2.3).

HarmonizeState

Mutable container passed through every pipeline step.

Harmonizer

Recipe-driven harmonization via a composable step pipeline.

Functions#

harmonize(→ None)

Instantiate and run harmonization for recipe.

Package Contents#

class openplaces.io.harmonizer.SourceGeometryType#

Bases: enum.StrEnum

Source geometry classification (Lochhead et al. 2026, §2.3).

A footprint is the 2-D ground-extent polygon of one or more structures. A building point represents a single insurable structure (e.g., NSI). A dwelling point represents a single unit within a larger structure.

class openplaces.io.harmonizer.HarmonizeState#

Mutable container passed through every pipeline step.

Fields are populated progressively as steps run.

Parameters:
  • recipe (dict) – The loaded harmonization recipe driving this run.

  • admin_id (AdminId or None) – Current admin unit being processed, or None for global runs.

  • verbose (bool) – Print per-step progress messages.

  • timer (object or None) – Timing helper (from get_timer()).

  • spine (GeoDataFrame or None) – The primary entity GeoDataFrame being built (e.g., footprints).

  • references (dict[str, GeoDataFrame]) – Reference datasets keyed by resolved recipe_id.

  • crosswalks (dict[str, GeoDataFrame]) – Tabular spine ↔ reference join tables, keyed by resolved recipe_id.

  • overlays (dict[str, GeoDataFrame]) – Geometry-bearing overlay results keyed by resolved recipe_id. Populated by link_to_reference for spatial_overlay joins and made available to subsequent steps (e.g., link_to_reference for spatial_point that needs footprint-parcel geometries).

  • reference_types (dict[str, str]) – Maps resolved recipe_id → entity_type (e.g. {'US-MA_parcel-mapc-2024': 'parcel'}). Lets steps look up all crosswalks of a given entity type without knowing the exact recipe_id.

  • source_geometry_types (dict[str, SourceGeometryType]) – Maps resolved recipe_id → SourceGeometryType. Populated by link_to_reference when source_geometry_type is declared in the recipe step. Used by classify_footprint_role to identify which linked datasets are evidence of primary buildings.

  • simplified_geometry (GeoSeries or None) – Set by simplify_geometries; written as a sidecar by the save step.

  • metadata (dict) – Arbitrary step-specific intermediate data (e.g. discovered admin sources, parcel-inferred footprint DataFrames).

get_crosswalks_by_type(entity_type: str) dict[str, geopandas.GeoDataFrame]#

Return all crosswalks whose reference matches entity_type.

get_references_by_type(entity_type: str) dict[str, geopandas.GeoDataFrame]#

Return all reference GeoDataFrames matching entity_type.

class openplaces.io.harmonizer.Harmonizer(recipe: str | dict, admin_ids: str | list | None = None, verbose: bool = False)#

Recipe-driven harmonization via a composable step pipeline.

Reads the pipeline list from the recipe and executes each declared step in order, passing a shared HarmonizeState between them. All configuration (thresholds, sources, enrichment) lives in the recipe — there are no entity-type-specific subclasses.

For recipes with process_by.admin_level > 0, harmonization runs once per admin unit. For recipes with process_by.admin_level == 0 (e.g., global admin geometry), harmonization runs once globally.

Parameters:
  • recipe (str or dict) – Harmonization recipe ID string or pre-loaded dict.

  • admin_ids (str, list, or None) – Admin IDs to harmonize. IDs coarser than process_by.admin_level are automatically expanded to matching children. None processes all children of the recipe’s admin_id at the process level.

  • verbose (bool) – Print progress messages.

harmonize(reprocess: bool = False) None#

Run harmonization for all configured admin IDs.

Parameters:

reprocess (bool) – If False (default), skip admin IDs whose output file already exists.

openplaces.io.harmonizer.harmonize(recipe: str | dict, admin_ids: str | list | None = None, reprocess: bool = False, verbose: bool = False) None#

Instantiate and run harmonization for recipe.

Convenience wrapper around Harmonizer(recipe, ...).harmonize().

Parameters:
  • recipe (str or dict) – Recipe ID string or loaded recipe dict.

  • admin_ids (str, list, or None) – Admin IDs to process.

  • reprocess (bool) – If True, re-run even if output already exists.

  • verbose (bool) – Print progress messages.