attributes#

Pipeline steps that attach attributes from reference datasets to the spine:
  • reconcile_attributes: aggregate columns from established crosswalks

  • infer_attributes: compute derived columns (area, value ratios, etc.)

Functions#

reverse_occ_units(→ str)

Re-classify a summed unit count to the nearest occupancy_type label.

reconcile_attributes(...)

Aggregate reference attributes to the spine via established crosswalks.

classify_footprint_role(...)

Classify spine footprints as 'primary', 'secondary', or 'unknown'.

infer_attributes(→ openplaces.io.harmonizer.HarmonizeState)

Compute derived columns on the spine.

infer_occupancy_type(...)

Infer a coarse occupancy category from NSI, parcel, and footprint geometry.

Module Contents#

openplaces.io.harmonizer.attributes.reverse_occ_units(total_units: float) str#

Re-classify a summed unit count to the nearest occupancy_type label.

Mirrors the map_to_units logic from Lochhead et al. (2026). Used when multiple NSI points link to the same footprint and their unit counts must be aggregated and re-classified.

openplaces.io.harmonizer.attributes.reconcile_attributes(state: openplaces.io.harmonizer.HarmonizeState, sources: list[dict] | None = None, priority: dict[str, list[str]] | None = None) openplaces.io.harmonizer.HarmonizeState#

Aggregate reference attributes to the spine via established crosswalks.

For each source in sources, looks up the crosswalk in state.crosswalks (resolved via recipe_id or entity_type) and aggregates the requested columns to the spine.

Parameters:
  • sources (list of dict) –

    Each dict describes one reference source and may contain:

    recipe_id (str, optional)

    Explicit crosswalk key in state.crosswalks.

    entity_type (str, optional)

    Selects all matching crosswalks via state.reference_types; used when recipe_id is absent.

    columns (list of str, optional)

    Columns to aggregate. Defaults to all available columns from the corresponding default column list.

  • priority (dict of {feature: [source_suffix, ...]}, optional) –

    Between-source priority for specific features (Lochhead et al. 2026, Step C). Each key is a bare feature name (e.g. 'year_built'); the value is an ordered list of source suffixes (without leading _) to try in order. The first non-null suffixed column wins.

    Example:

    priority:
      purpose_subgroup: [nsi, parcel]
      year_built: [parcel, nsi]
    

openplaces.io.harmonizer.attributes.classify_footprint_role(state: openplaces.io.harmonizer.HarmonizeState, entity_type: str | None = None, thresholds: dict | None = None, **_params) openplaces.io.harmonizer.HarmonizeState#

Classify spine footprints as 'primary', 'secondary', or 'unknown'.

Uses dwelling-point and building-point evidence to assign roles within each parcel (Lochhead et al. 2026, Table 4):

  1. If any footprint on the parcel has dwelling-point evidence (SourceGeometryType.single_dwelling_point), those footprints are 'primary'; all others on the same parcel are 'secondary'.

  2. Else if any footprint has single-building-point evidence (SourceGeometryType.single_building_point, e.g. NSI), those are 'primary'; all others are 'secondary'.

  3. If no footprint on a multi-footprint parcel has evidence, all are 'secondary'.

  4. Footprints that are the sole geometry on their parcel are always 'primary'.

  5. Footprints not linked to any parcel are 'unknown', unless they carry dwelling-point evidence — those are promoted to 'primary'.

Parameters:
  • entity_type (str, optional) – Entity type used to locate the parcel crosswalk in state.crosswalks. Defaults to 'parcel'.

  • thresholds (dict, optional) – Not currently used; retained for recipe compatibility.

openplaces.io.harmonizer.attributes.infer_attributes(state: openplaces.io.harmonizer.HarmonizeState, derived: list[str] | None = None, **_params) openplaces.io.harmonizer.HarmonizeState#

Compute derived columns on the spine.

Parameters:

derived (list of str, optional) –

Names of derived columns to compute. Supported values:

'area' / 'm2'

Footprint area in square metres (stored as 'm2').

'value_per_sqft' / 'value_per_area'

improvement_value{suffix} / m2 and structure_value{suffix} / m2.

'openplaces_group_combined'

Combined group label reconciling polygon and point reference sources.

'n_dwelling_units'

Fill null n_dwelling_units values from occupancy-class mapping when a purpose_subgroup column is present on the spine.

When derived is None or empty, all of the above are attempted.

openplaces.io.harmonizer.attributes.infer_occupancy_type(state: openplaces.io.harmonizer.HarmonizeState, thresholds: dict | None = None, **_params) openplaces.io.harmonizer.HarmonizeState#

Infer a coarse occupancy category from NSI, parcel, and footprint geometry.

Populates occupancy_type (categorical) on the spine using a three-step cascade:

  1. occupancy_type_building_nsi — NSI occupancy label mapped to coarse class (Single-Family, Multi-Family, Mobile Home).

  2. Footprint geometry — elongated small footprints flagged as Mobile Home when NSI and parcel evidence are absent.

  3. n_dwelling_units — fills remaining residential gaps (n==1 → Single-Family, n≥2 → Multi-Family).

Parameters:

thresholds (dict, optional) – mobile_home_aspect_min (float, default 2.5) — minimum oriented-bounding-box aspect ratio (length/width) to consider a footprint elongated. mobile_home_area_max_m2 (float, default 185) — maximum footprint area (m²) for the mobile-home geometry signal (~2 000 sqft).