ingester#

Orchestrates download, unzip, and processing of a recipe into output parquet files.

Classes#

Ingester

Smart data ingester for openplaces ingestion recipes.

Functions#

ingest(→ None)

Instantiate and run ingestion for recipe.

Module Contents#

class openplaces.io.ingester.Ingester(recipe: str | dict = None, admin_ids: str | list | None = None, partition_ids: str | list | None = None, timer: openplaces.timing.Timer | None = None, verbose: bool = False)#

Smart data ingester for openplaces ingestion recipes.

Handles downloads, unzipping, loading, and preprocessing.

ingest(reprocess=False, redownload=False, keep_unzipped=False, target_recipe_id: str | None = None)#

Run the full data ingestion

Parameters:
  • reprocess (bool) – If True, re-runs the data ingestion from the original file even if the output data already exists.

  • redownload (bool) – If True, re-downloads the original data file even if it already exists. Also sets reprocess to True.

  • keep_unzipped (bool) – If True, keeps unzipped files in ‘heap’ folder after the download partition has been processed.

  • target_recipe_id (str, optional) – For image recipes only: recipe ID of the harmonized entity to photograph (e.g. 'US_building-nsi-2022'). Overrides entity_recipe in the image recipe YAML.

show_ingested_geometries(**kwargs)#

Plot the last ingested layer for visual inspection.

Delegates to openplaces.viz.maps.show_ingested_geometries(). See that function for the full list of keyword arguments.

show_random_entity()#

Plot a random entity from the last ingested admin unit with its attributes.

Delegates to openplaces.viz.maps.show_random_entity().

sample_layer(n=5)#

Return a transposed sample of the principal entity DataFrame.

Parameters:

n (int) – Number of rows to sample.

Returns:

Transposed sample of the principal entity table.

Return type:

pd.DataFrame

sample_additional_layer(n=5)#

Return a transposed sample of the first additional layer.

Parameters:

n (int) – Number of rows to sample.

Returns:

Transposed sample of the additional layer table.

Return type:

pd.DataFrame

openplaces.io.ingester.ingest(recipe: str | dict, admin_ids: str | list | None = None, partition_ids: str | list | None = None, reprocess: bool = False, redownload: bool = False, keep_unzipped: bool = False, verbose: bool = False) None#

Instantiate and run ingestion for recipe.

Convenience wrapper around Ingester(recipe, ...).ingest().

Parameters:
  • recipe (str or dict) – Recipe ID string or loaded recipe dict.

  • admin_ids (str, list, or None) – Admin IDs to process (passed to the Ingester constructor).

  • partition_ids (str, list, or None) – Partition IDs to process (passed to the Ingester constructor).

  • reprocess (bool) – If True, re-run even if output already exists.

  • redownload (bool) – If True, re-download even if source file already exists. Also sets reprocess to True.

  • keep_unzipped (bool) – If True, keep unzipped files in the heap folder after processing.

  • verbose (bool) – Print progress messages.