api#

Public API for openplaces.

Re-exports data access, ingest, harmonize, and aggregation functions.

Functions#

aggregate_files(recipe, admin_level[, output_dir, ...])

Aggregate per-process-unit intermediate files into save-level files.

harmonize(→ None)

Instantiate and run harmonization for recipe.

ingest(→ None)

Instantiate and run ingestion for recipe.

get_admin([admin_id, level, recipe, geom, columns, ...])

Get admin units of any administrative level

get_admin_ids(admin_level[, admin_id, admin_recipe])

Get list of administrative unit IDs

get_dataset(recipe[, admin_id, partition_id, geom])

Load a processed dataset by recipe.

get_entities(recipe[, admin_id, geom, layer, partition_id])

Generic function to load a processed Parquet table for entities

Module Contents#

openplaces.api.aggregate_files(recipe, admin_level, output_dir=None, admin_ids_to_save=None, admin_ids_to_aggregate=None, keep_original=False, combined=False, verbose=False)#

Aggregate per-process-unit intermediate files into save-level files.

Used when the desired output level is coarser than the level at which files were written (process_by.admin_level). Reads the intermediate parquet files, concatenates them into one file per save-level unit, and deletes the originals (unless keep_original is True).

Parameters:
  • recipe (str or dict) – Recipe ID string (e.g. 'US_footprint-cheer-2026') or a pre-loaded recipe dict.

  • admin_level (int) – Target admin level for output files (e.g. 2 for state-level). Required explicitly because the recipe’s own save_to.admin_level may differ from the intended aggregation target.

  • output_dir (str, optional) – Directory for the aggregated output files (e.g. 'share'). Does not affect where intermediate input files are looked up — those are always resolved from the recipe’s original save_to.data_dir. Uses the recipe default if omitted.

  • admin_ids_to_save (str, AdminId, or list, optional) – Save-level admin ID(s) for which to write output files. Accepts a single value or a list. Defaults to all admin IDs at admin_level that are children of recipe['admin_id'].

  • admin_ids_to_aggregate (str, AdminId, or list, optional) – Process-level admin ID(s) whose intermediate files should be included as input. Accepts a single value or a list. Defaults to all process-level children of each admin_ids_to_save entry.

  • keep_original (bool) – If True, do not delete the intermediate files after aggregation.

  • combined (bool) – If True, write the aggregated output as a single geoparquet file (attributes and geometry together) rather than the default split layout of an attribute table plus a _geo sidecar. Passed through to save_parquet().

  • verbose (bool) – If True, print a summary line for each aggregated file.

openplaces.api.harmonize(recipe: str | dict, admin_ids: str | list | None = None, reprocess: bool = False, verbose: bool = False) None#

Instantiate and run harmonization for recipe.

Convenience wrapper around Harmonizer(recipe, ...).harmonize().

Parameters:
  • recipe (str or dict) – Recipe ID string or loaded recipe dict.

  • admin_ids (str, list, or None) – Admin IDs to process.

  • reprocess (bool) – If True, re-run even if output already exists.

  • verbose (bool) – Print progress messages.

openplaces.api.ingest(recipe: str | dict, admin_ids: str | list | None = None, partition_ids: str | list | None = None, reprocess: bool = False, redownload: bool = False, keep_unzipped: bool = False, verbose: bool = False) None#

Instantiate and run ingestion for recipe.

Convenience wrapper around Ingester(recipe, ...).ingest().

Parameters:
  • recipe (str or dict) – Recipe ID string or loaded recipe dict.

  • admin_ids (str, list, or None) – Admin IDs to process (passed to the Ingester constructor).

  • partition_ids (str, list, or None) – Partition IDs to process (passed to the Ingester constructor).

  • reprocess (bool) – If True, re-run even if output already exists.

  • redownload (bool) – If True, re-download even if source file already exists. Also sets reprocess to True.

  • keep_unzipped (bool) – If True, keep unzipped files in the heap folder after processing.

  • verbose (bool) – Print progress messages.

openplaces.api.get_admin(admin_id=None, level=None, recipe=None, geom=False, columns=None, all_columns=False, silent=True)#

Get admin units of any administrative level

Parameters:
  • admin_id (str, list, or openplaces.core.schema.AdminId) – Identifier(s) of admin units to return. Can include higher-level Admin IDs to select many lower levels

  • level (int) – Admin level for which to return units. If none, use level of admin_id (deepest if a list is passed).

  • recipe (str) – Use this recipe to import geometries and additional attributes.

  • geom (bool or 'simplified') – If False or None, return DataFrame without geometries. If True, return GeoDataFrame with geometries.

  • columns (list of str or None) – If a list of strings, will be used to select columns.

  • all_columns (bool) – If True, returns not only the most important columns

  • silent (True) – Silence warnings

  • geom – If False, return a DataFrame without geometries. If True, return a GeoDataFrame with full geometries. If 'simplified', return a GeoDataFrame with simplified geometries from the _geo_simplified companion file written by AdminHarmonizer.

openplaces.api.get_admin_ids(admin_level, admin_id=None, admin_recipe=None)#

Get list of administrative unit IDs

openplaces.api.get_dataset(recipe, admin_id=None, partition_id=None, geom=False)#

Load a processed dataset by recipe.

Handles both raster and tabular dataset recipes. For raster datasets (Cloud Optimized GeoTIFFs written by fetch_rasters_by_admin), returns the path to the .tif file so the caller controls resource management (e.g. with rasterio or xarray). For tabular datasets, returns a DataFrame or GeoDataFrame exactly as get_entities does.

Parameters:
  • recipe (str or dict) – Recipe that defines the dataset. Can be a loaded recipe (dict) or a string recipe ID.

  • admin_id (str or AdminId, optional) – Administrative unit for which to load the data. If None, uses the admin_id from the recipe.

  • partition_id (str, optional) – Partition value to locate a specific partition file, e.g. ‘2020’ for a year-partitioned recipe. Pass None (default) for recipes without partitioning.

  • geom (bool) – If True, include geometries and return a GeoDataFrame. Ignored for raster datasets.

Returns:

  • Path – Path to the .tif file, for raster datasets.

  • pandas.DataFrame or geopandas.GeoDataFrame – Loaded tabular data, for non-raster datasets.

Raises:

ValueError – If the recipe does not have a ‘dataset’ key. Use get_entities for entity recipes.

openplaces.api.get_entities(recipe, admin_id=None, geom=False, layer=None, partition_id=None)#

Generic function to load a processed Parquet table for entities

Entities are administrative units (admin), parcels, buildings, transactions, etc., as defined by the recipe.

Parameters:
  • recipe (str or dict) – Recipe that defines the entity. Can be a loaded recipe (dict) or a string of the recipe_id (which includes admin_id)

  • admin_id (str or AdminId) – Administrative unit for which to load the data. If None, choose admin_id of recipe.

  • geom (bool) – If True, include geometries and return a GeoDataFrame.

  • layer (str, optional) – Entity type (e.g. ‘property’) or full entity string (e.g. ‘property-massgis-2025’) of a secondary layer defined in additional_layers. If given, load that layer instead of the primary entity.

  • partition_id (str, optional) – Partition value to read a specific per-partition file, e.g. ‘032012’ for a tile-partitioned recipe. Pass None (default) to read the final merged output.