recipe#
Functions to handle recipes for data ingestion and harmonization
Read and validate recipes, find recipes, build derivatives, get output paths etc.
Functions#
|
Load recipe (.yaml, .csv or .xlsx) |
|
Read a recipe .yaml file as a dictionary, cast it to schema |
|
Shortcut to get recipe_id by its parts |
|
Merge a primary recipe with an additional_layers spec. |
|
Return the merged recipe for a secondary layer identified by entity. |
|
Find a recipe ID by admin_id and entity/dataset identifier. |
|
Find the ID of an administrative data ingestion recipe |
|
Find the ID of an entity data ingestion recipe. |
|
Return the layer names available for a recipe's 'additional_layers'. |
|
Return the path where recipe output is written. |
|
Return the admin level at which output files are split. |
|
Return the admin level at which data is chunked for processing. |
|
Return the admin level at which downloads are partitioned. |
|
Return the list of valid partition ID strings for a recipe. |
Module Contents#
- openplaces.recipe.get_recipe(*args, **kwargs)#
Load recipe (.yaml, .csv or .xlsx)
- Parameters:
args (tuple) – Arguments for openplaces.path.recipe_path
kwargs (dict) – Keywords arguments. Those in openplaces.path.OpenPlacesReference and openplaces.path.recipe_path will be used to find the path, the remainder is passed to the reading functions: - yaml.safe_load() - pd.read_csv() - pd.read_excel()
- openplaces.recipe.get_recipe_dict(filepath, *args, **kwargs)#
Read a recipe .yaml file as a dictionary, cast it to schema
- Parameters:
filepath (pathlib.Path) – Filepath to .yaml file
args (list) – Passed on from get_recipe
kwargs (dict) – Passed on from get_recipe
- openplaces.recipe.get_recipe_by_id(recipe_id, **kwargs)#
Shortcut to get recipe_id by its parts
Assumes syntax: {admin_id}_{entity}_{filename}.{extension}
admin_id or filename can be missing
(Datasets for non-entities aren’t yet supported)
- Parameters:
recipe_id (str) – Identifier or a recipe
kwargs (dict) – Keyword arguments will be passed on to get_recipe()
- openplaces.recipe.build_table_recipe(primary_recipe: dict, layer_spec: dict) dict#
Merge a primary recipe with an additional_layers spec.
Per-table keys (entity, layer, columns, index config, etc.) are taken from layer_spec when present, otherwise removed so that primary-only values do not bleed into the secondary table. process_by is inherited from the primary unless layer_spec sets it explicitly (use ‘process_by: null’ in the YAML to disable chunking for a specific additional table).
- Parameters:
primary_recipe (dict) – Loaded primary recipe dictionary.
layer_spec (dict) – One entry from the primary recipe’s ‘additional_layers’ list.
- Returns:
Merged recipe dict for the layer.
- Return type:
dict
- openplaces.recipe.get_table_recipe(recipe: str | dict, layer: str) dict#
Return the merged recipe for a secondary layer identified by entity.
- Parameters:
recipe (str or dict) – Primary recipe (ID string or loaded dict).
layer (str) – Entity type (e.g. ‘property’) or full entity string (e.g. ‘property-massgis-2025’) of the additional layer.
- Returns:
Merged recipe dict for the requested layer.
- Return type:
dict
- Raises:
KeyError – If no additional_layers entry matching layer is found.
- openplaces.recipe.find_recipe_id(admin_id, entity_or_dataset, filename=None, silent=False)#
Find a recipe ID by admin_id and entity/dataset identifier.
- Parameters:
admin_id (str) – Administrative unit identifier.
entity_or_dataset (str) – Entity or dataset identifier string, may contain glob wildcards (e.g. ‘parcel--’, ‘admin-census-2021’).
filename (str, optional) – Filename stem to match within the recipe directory. When None (default), matches any .yaml file in the entity directory. A .yaml extension is appended automatically if absent.
silent (bool) – If True, suppress the message printed when multiple recipes are found.
- openplaces.recipe.find_admin_recipe_id(admin_id, admin_level, silent=False)#
Find the ID of an administrative data ingestion recipe
- Parameters:
admin_id (str) – Administrative unit identifier
admin_level (int) – Administrative level for which a recipe is sought.
silent (bool) – If True, suppress the message printed when multiple recipes are found.
- openplaces.recipe.find_entity_recipe_id(admin_id, entity_type, **kwargs)#
Find the ID of an entity data ingestion recipe.
- Parameters:
admin_id (str) – Administrative unit identifier.
entity_type (str) – Entity type (e.g. ‘parcel’, ‘building’, ‘footprint’).
**kwargs – Passed to
find_recipe_id()(filename,silent).
- openplaces.recipe.get_layers(recipe: str | dict) list[str]#
Return the layer names available for a recipe’s ‘additional_layers’.
These are the values accepted by the layer argument of ‘get_entities’ and ‘get_output_path’.
- Parameters:
recipe (str or dict) – Recipe dict or recipe ID string.
- Returns:
Entity type strings (e.g. ‘property’, ‘transaction’) for each entry in ‘additional_layers’.
- Return type:
list of str
- openplaces.recipe.get_output_path(recipe, admin_id=None, partition_id=None, geo=False, layer=None)#
Return the path where recipe output is written.
Mirrors Ingester._get_output_path without instantiating an Ingester. The output root is determined by ‘save_to’: ‘data_dir’ in the recipe (default: ‘cache’), which must name a directory registered in STANDARD_DIRS.
- Parameters:
recipe (str or dict) – Recipe identifier (as accepted by get_recipe_by_id) or a pre-loaded recipe dict.
admin_id (str or AdminId, optional) – Administrative unit for which to resolve the output path. Pass None for recipes not split by admin unit.
partition_id (str, optional) – Partition value appended to the filename stem, e.g. ‘US-NC-BS_footprint-obm-2025_032012.parquet’ for a tile partition with id ‘032012’. Pass None (default) to obtain the final, merged output path.
geo (bool, optional) – If True, return the path to the companion ‘_geo.parquet’ file instead of the attribute parquet file.
layer (str, optional) – Entity type (e.g. ‘property’) or full entity string (e.g. ‘property-massgis-2025’) of a secondary layer defined in additional_layers. If given, the path for that layer is returned instead of the primary entity’s path.
- Returns:
Resolved output path for the recipe data file.
- Return type:
pathlib.Path
- openplaces.recipe.get_save_admin_level(recipe, operation_keys=('download_by', 'process_by', 'save_to'))#
Return the admin level at which output files are split.
When save_to: admin_level is explicitly set it defines the output granularity directly — process_by or download_by may be finer (aggregation) or coarser than this level. When save_to: admin_level is absent the level is the maximum found across the given operation keys, falling back to the recipe’s own admin ID depth.
- Parameters:
recipe (dict) – Loaded recipe dictionary.
operation_keys (tuple of str) – Recipe section keys to inspect for ‘admin_level’. ‘save_to’ is included by default since save_to: admin_level controls output granularity. Override when calling from other recipe runners.
- Returns:
Admin level for output files (0 = no admin split).
- Return type:
int
- openplaces.recipe.get_process_admin_level(recipe)#
Return the admin level at which data is chunked for processing.
- openplaces.recipe.get_download_admin_level(recipe)#
Return the admin level at which downloads are partitioned.
- openplaces.recipe.get_partition_ids(recipe)#
Return the list of valid partition ID strings for a recipe.
Returns [None] for recipes without a ‘download_by’: ‘partition’ key.
- Parameters:
recipe (dict) – Loaded recipe dictionary.
- Return type:
list of str or list of None
- Raises:
ValueError – If ‘download_by’: ‘partition’ is ‘year’ but ‘first’/’last’ are not defined.
NotImplementedError – If ‘download_by’: ‘partition’ names an unrecognised partition type.