recipe#

Functions to handle recipes for data ingestion and harmonization

Read and validate recipes, find recipes, build derivatives, get output paths etc.

Functions#

get_recipe(*args, **kwargs)

Load recipe (.yaml, .csv or .xlsx)

get_recipe_dict(filepath, *args, **kwargs)

Read a recipe .yaml file as a dictionary, cast it to schema

get_recipe_by_id(recipe_id, **kwargs)

Shortcut to get recipe_id by its parts

build_table_recipe(→ dict)

Merge a primary recipe with an additional_layers spec.

get_table_recipe(→ dict)

Return the merged recipe for a secondary layer identified by entity.

find_recipe_id(admin_id, entity_or_dataset[, ...])

Find a recipe ID by admin_id and entity/dataset identifier.

find_admin_recipe_id(admin_id, admin_level[, silent])

Find the ID of an administrative data ingestion recipe

find_entity_recipe_id(admin_id, entity_type, **kwargs)

Find the ID of an entity data ingestion recipe.

get_layers(→ list[str])

Return the layer names available for a recipe's 'additional_layers'.

get_output_path(recipe[, admin_id, partition_id, geo, ...])

Return the path where recipe output is written.

get_save_admin_level(recipe[, operation_keys])

Return the admin level at which output files are split.

get_process_admin_level(recipe)

Return the admin level at which data is chunked for processing.

get_download_admin_level(recipe)

Return the admin level at which downloads are partitioned.

get_partition_ids(recipe)

Return the list of valid partition ID strings for a recipe.

Module Contents#

openplaces.recipe.get_recipe(*args, **kwargs)#

Load recipe (.yaml, .csv or .xlsx)

Parameters:
  • args (tuple) – Arguments for openplaces.path.recipe_path

  • kwargs (dict) – Keywords arguments. Those in openplaces.path.OpenPlacesReference and openplaces.path.recipe_path will be used to find the path, the remainder is passed to the reading functions: - yaml.safe_load() - pd.read_csv() - pd.read_excel()

openplaces.recipe.get_recipe_dict(filepath, *args, **kwargs)#

Read a recipe .yaml file as a dictionary, cast it to schema

Parameters:
  • filepath (pathlib.Path) – Filepath to .yaml file

  • args (list) – Passed on from get_recipe

  • kwargs (dict) – Passed on from get_recipe

openplaces.recipe.get_recipe_by_id(recipe_id, **kwargs)#

Shortcut to get recipe_id by its parts

Assumes syntax: {admin_id}_{entity}_{filename}.{extension}

admin_id or filename can be missing

(Datasets for non-entities aren’t yet supported)

Parameters:
  • recipe_id (str) – Identifier or a recipe

  • kwargs (dict) – Keyword arguments will be passed on to get_recipe()

openplaces.recipe.build_table_recipe(primary_recipe: dict, layer_spec: dict) dict#

Merge a primary recipe with an additional_layers spec.

Per-table keys (entity, layer, columns, index config, etc.) are taken from layer_spec when present, otherwise removed so that primary-only values do not bleed into the secondary table. process_by is inherited from the primary unless layer_spec sets it explicitly (use ‘process_by: null’ in the YAML to disable chunking for a specific additional table).

Parameters:
  • primary_recipe (dict) – Loaded primary recipe dictionary.

  • layer_spec (dict) – One entry from the primary recipe’s ‘additional_layers’ list.

Returns:

Merged recipe dict for the layer.

Return type:

dict

openplaces.recipe.get_table_recipe(recipe: str | dict, layer: str) dict#

Return the merged recipe for a secondary layer identified by entity.

Parameters:
  • recipe (str or dict) – Primary recipe (ID string or loaded dict).

  • layer (str) – Entity type (e.g. ‘property’) or full entity string (e.g. ‘property-massgis-2025’) of the additional layer.

Returns:

Merged recipe dict for the requested layer.

Return type:

dict

Raises:

KeyError – If no additional_layers entry matching layer is found.

openplaces.recipe.find_recipe_id(admin_id, entity_or_dataset, filename=None, silent=False)#

Find a recipe ID by admin_id and entity/dataset identifier.

Parameters:
  • admin_id (str) – Administrative unit identifier.

  • entity_or_dataset (str) – Entity or dataset identifier string, may contain glob wildcards (e.g. ‘parcel--’, ‘admin-census-2021’).

  • filename (str, optional) – Filename stem to match within the recipe directory. When None (default), matches any .yaml file in the entity directory. A .yaml extension is appended automatically if absent.

  • silent (bool) – If True, suppress the message printed when multiple recipes are found.

openplaces.recipe.find_admin_recipe_id(admin_id, admin_level, silent=False)#

Find the ID of an administrative data ingestion recipe

Parameters:
  • admin_id (str) – Administrative unit identifier

  • admin_level (int) – Administrative level for which a recipe is sought.

  • silent (bool) – If True, suppress the message printed when multiple recipes are found.

openplaces.recipe.find_entity_recipe_id(admin_id, entity_type, **kwargs)#

Find the ID of an entity data ingestion recipe.

Parameters:
  • admin_id (str) – Administrative unit identifier.

  • entity_type (str) – Entity type (e.g. ‘parcel’, ‘building’, ‘footprint’).

  • **kwargs – Passed to find_recipe_id() (filename, silent).

openplaces.recipe.get_layers(recipe: str | dict) list[str]#

Return the layer names available for a recipe’s ‘additional_layers’.

These are the values accepted by the layer argument of ‘get_entities’ and ‘get_output_path’.

Parameters:

recipe (str or dict) – Recipe dict or recipe ID string.

Returns:

Entity type strings (e.g. ‘property’, ‘transaction’) for each entry in ‘additional_layers’.

Return type:

list of str

openplaces.recipe.get_output_path(recipe, admin_id=None, partition_id=None, geo=False, layer=None)#

Return the path where recipe output is written.

Mirrors Ingester._get_output_path without instantiating an Ingester. The output root is determined by ‘save_to’: ‘data_dir’ in the recipe (default: ‘cache’), which must name a directory registered in STANDARD_DIRS.

Parameters:
  • recipe (str or dict) – Recipe identifier (as accepted by get_recipe_by_id) or a pre-loaded recipe dict.

  • admin_id (str or AdminId, optional) – Administrative unit for which to resolve the output path. Pass None for recipes not split by admin unit.

  • partition_id (str, optional) – Partition value appended to the filename stem, e.g. ‘US-NC-BS_footprint-obm-2025_032012.parquet’ for a tile partition with id ‘032012’. Pass None (default) to obtain the final, merged output path.

  • geo (bool, optional) – If True, return the path to the companion ‘_geo.parquet’ file instead of the attribute parquet file.

  • layer (str, optional) – Entity type (e.g. ‘property’) or full entity string (e.g. ‘property-massgis-2025’) of a secondary layer defined in additional_layers. If given, the path for that layer is returned instead of the primary entity’s path.

Returns:

Resolved output path for the recipe data file.

Return type:

pathlib.Path

openplaces.recipe.get_save_admin_level(recipe, operation_keys=('download_by', 'process_by', 'save_to'))#

Return the admin level at which output files are split.

When save_to: admin_level is explicitly set it defines the output granularity directly — process_by or download_by may be finer (aggregation) or coarser than this level. When save_to: admin_level is absent the level is the maximum found across the given operation keys, falling back to the recipe’s own admin ID depth.

Parameters:
  • recipe (dict) – Loaded recipe dictionary.

  • operation_keys (tuple of str) – Recipe section keys to inspect for ‘admin_level’. ‘save_to’ is included by default since save_to: admin_level controls output granularity. Override when calling from other recipe runners.

Returns:

Admin level for output files (0 = no admin split).

Return type:

int

openplaces.recipe.get_process_admin_level(recipe)#

Return the admin level at which data is chunked for processing.

openplaces.recipe.get_download_admin_level(recipe)#

Return the admin level at which downloads are partitioned.

openplaces.recipe.get_partition_ids(recipe)#

Return the list of valid partition ID strings for a recipe.

Returns [None] for recipes without a ‘download_by’: ‘partition’ key.

Parameters:

recipe (dict) – Loaded recipe dictionary.

Return type:

list of str or list of None

Raises:
  • ValueError – If ‘download_by’: ‘partition’ is ‘year’ but ‘first’/’last’ are not defined.

  • NotImplementedError – If ‘download_by’: ‘partition’ names an unrecognised partition type.