ingester#
Orchestrates download, unzip, and processing of a recipe into output parquet files.
Classes#
Smart data ingester for openplaces ingestion recipes. |
Functions#
|
Instantiate and run ingestion for recipe. |
Module Contents#
- class openplaces.io.ingester.Ingester(recipe: str | dict = None, admin_ids: str | list | None = None, partition_ids: str | list | None = None, timer: openplaces.timing.Timer | None = None, verbose: bool = False)#
Smart data ingester for openplaces ingestion recipes.
Handles downloads, unzipping, loading, and preprocessing.
- ingest(reprocess=False, redownload=False, keep_unzipped=False, target_recipe_id: str | None = None)#
Run the full data ingestion
- Parameters:
reprocess (bool) – If True, re-runs the data ingestion from the original file even if the output data already exists.
redownload (bool) – If True, re-downloads the original data file even if it already exists. Also sets reprocess to True.
keep_unzipped (bool) – If True, keeps unzipped files in ‘heap’ folder after the download partition has been processed.
target_recipe_id (str, optional) – For image recipes only: recipe ID of the harmonized entity to photograph (e.g.
'US_building-nsi-2022'). Overridesentity_recipein the image recipe YAML.
- show_ingested_geometries(**kwargs)#
Plot the last ingested layer for visual inspection.
Delegates to
openplaces.viz.maps.show_ingested_geometries(). See that function for the full list of keyword arguments.
- show_random_entity()#
Plot a random entity from the last ingested admin unit with its attributes.
Delegates to
openplaces.viz.maps.show_random_entity().
- sample_layer(n=5)#
Return a transposed sample of the principal entity DataFrame.
- Parameters:
n (int) – Number of rows to sample.
- Returns:
Transposed sample of the principal entity table.
- Return type:
pd.DataFrame
- sample_additional_layer(n=5)#
Return a transposed sample of the first additional layer.
- Parameters:
n (int) – Number of rows to sample.
- Returns:
Transposed sample of the additional layer table.
- Return type:
pd.DataFrame
- openplaces.io.ingester.ingest(recipe: str | dict, admin_ids: str | list | None = None, partition_ids: str | list | None = None, reprocess: bool = False, redownload: bool = False, keep_unzipped: bool = False, verbose: bool = False) None#
Instantiate and run ingestion for recipe.
Convenience wrapper around
Ingester(recipe, ...).ingest().- Parameters:
recipe (str or dict) – Recipe ID string or loaded recipe dict.
admin_ids (str, list, or None) – Admin IDs to process (passed to the Ingester constructor).
partition_ids (str, list, or None) – Partition IDs to process (passed to the Ingester constructor).
reprocess (bool) – If True, re-run even if output already exists.
redownload (bool) – If True, re-download even if source file already exists. Also sets
reprocesstoTrue.keep_unzipped (bool) – If True, keep unzipped files in the heap folder after processing.
verbose (bool) – Print progress messages.