aggregate#
Functions to aggregate per-process-unit intermediate files into final save-level output files.
Functions#
Join non-null values of x as strings with ' + '; None when all null. |
|
|
Aggregate rows of df using per-column functions from the attribute registry. |
|
Aggregate per-process-unit intermediate files into save-level files. |
|
Aggregate per-process-unit intermediate files into save-level files. |
Module Contents#
- openplaces.io.aggregate.join_nonnull_strings(x)#
Join non-null values of x as strings with ‘ + ‘; None when all null.
- openplaces.io.aggregate.aggregate_rows(df: pandas.DataFrame, by: str | list[str], aggregation_function=None, sort_by: str | None = None, list_columns: list[str] | None = None) pandas.DataFrame | None#
Aggregate rows of df using per-column functions from the attribute registry.
- Parameters:
df (pd.DataFrame) – Input DataFrame. Columns that appear in the attribute registry with a non-null aggregation function are included in the result.
by (str or list of str) – Column(s) to group by.
aggregation_function (None, callable, or dict, optional) –
Controls which aggregation function is applied to each column.
NoneUse the function recorded in the attribute registry for each column.
- callable
Apply this single function to all aggregatable columns.
- dict
Map column names to callables; columns absent from the dict fall back to the registry default.
sort_by (str, optional) – Column to sort df by descending before grouping. When the column is absent or omitted and df is a GeoDataFrame, rows are sorted by geometry area descending.
list_columns (list of str, optional) – Column names for which an additional
{col}_listcolumn is added to the output, collecting all values per group into a Python list. The normal scalar aggregation for each column is still applied; these are extra columns alongside the registry-aggregated ones.
- Returns:
Aggregated DataFrame with by as the index, or
Nonewhen no aggregatable columns are found in df.- Return type:
pd.DataFrame or None
- Raises:
ValueError – When aggregation_function is not
None, a callable, or a dict.
- openplaces.io.aggregate.aggregate_files(recipe, admin_level, output_dir=None, admin_ids_to_save=None, admin_ids_to_aggregate=None, keep_original=False, combined=False, verbose=False)#
Aggregate per-process-unit intermediate files into save-level files.
Used when the desired output level is coarser than the level at which files were written (
process_by.admin_level). Reads the intermediate parquet files, concatenates them into one file per save-level unit, and deletes the originals (unless keep_original is True).- Parameters:
recipe (str or dict) – Recipe ID string (e.g.
'US_footprint-cheer-2026') or a pre-loaded recipe dict.admin_level (int) – Target admin level for output files (e.g.
2for state-level). Required explicitly because the recipe’s ownsave_to.admin_levelmay differ from the intended aggregation target.output_dir (str, optional) – Directory for the aggregated output files (e.g.
'share'). Does not affect where intermediate input files are looked up — those are always resolved from the recipe’s originalsave_to.data_dir. Uses the recipe default if omitted.admin_ids_to_save (str, AdminId, or list, optional) – Save-level admin ID(s) for which to write output files. Accepts a single value or a list. Defaults to all admin IDs at admin_level that are children of
recipe['admin_id'].admin_ids_to_aggregate (str, AdminId, or list, optional) – Process-level admin ID(s) whose intermediate files should be included as input. Accepts a single value or a list. Defaults to all process-level children of each admin_ids_to_save entry.
keep_original (bool) – If True, do not delete the intermediate files after aggregation.
combined (bool) – If True, write the aggregated output as a single geoparquet file (attributes and geometry together) rather than the default split layout of an attribute table plus a
_geosidecar. Passed through tosave_parquet().verbose (bool) – If True, print a summary line for each aggregated file.
- openplaces.io.aggregate.aggregate_to_admin_level(recipe, admin_ids_to_process=None, keep_intermediates=False, combined=False, verbose=False)#
Aggregate per-process-unit intermediate files into save-level files.
Wrapper around
aggregate()that reads the save level from the recipe’ssave_to.admin_levelfield.- Parameters:
recipe (dict) – Loaded recipe dictionary. Must have an explicit
save_to: admin_levelthat is coarser than theprocess_by/download_bylevel.admin_ids_to_process (list of str, optional) – Admin IDs at the process level whose intermediate files should be read. Defaults to all IDs at the process level under the recipe’s save-level admin IDs.
keep_intermediates (bool) – If True, do not delete the intermediate files after aggregation.
combined (bool) – If True, write the aggregated output as a single geoparquet file. Passed through to
aggregate().verbose (bool) – If True, print a summary line for each aggregated file.