transform#
Transformation engine for applying variable transformations to DataFrames.
This module provides a flexible system for transforming variables based on YAML recipe specifications. It supports: - Unary operations (log, arcsinh, power, etc.) - Binary operations (arithmetic on two columns) - Aggregate operations (sum, min, max across multiple columns) - String remapping and reclassification - Conditional transformations - Date/time extractions - Complex expressions - Pattern-based transformations
Functions#
Apply transformations from recipe to dataframe. |
|
|
Apply a single transformation based on configuration. |
Apply pattern-based transformation to multiple columns. |
|
|
Get a crosswalk (Series of default keys -> source keys) |
|
Remap values in dataframe column using recipe table |
Make string Series unique by appending unique integer suffices. |
|
|
Return a copy of a DataFrame / GeoDataFrame with a unique string index. |
Module Contents#
- openplaces.io.transform.apply_transformations(df: pandas.DataFrame | geopandas.GeoDataFrame, recipe: dict[str, Any], silent: bool = False) pandas.DataFrame | geopandas.GeoDataFrame#
Apply transformations from recipe to dataframe.
- Parameters:
df (DataFrame or GeoDataFrame) – Input data to transform
recipe (dict) – Recipe dictionary containing ‘transformations’ and optionally ‘transformation_patterns’ keys
silent (bool, default False) – If True, suppress warnings
- Returns:
Transformed dataframe with new columns added
- Return type:
DataFrame or GeoDataFrame
- openplaces.io.transform.apply_transformation(df: pandas.DataFrame | geopandas.GeoDataFrame, config: dict[str, Any], silent: bool = False) pandas.DataFrame | geopandas.GeoDataFrame#
Apply a single transformation based on configuration.
- openplaces.io.transform.apply_transformation_pattern(df: pandas.DataFrame | geopandas.GeoDataFrame, config: dict[str, Any], silent: bool = False) pandas.DataFrame | geopandas.GeoDataFrame#
Apply pattern-based transformation to multiple columns.
- openplaces.io.transform.get_crosswalk(crosswalk_dict, flip=False)#
Get a crosswalk (Series of default keys -> source keys)
- Parameters:
crosswalk_dict (dict) – Dictionary with crosswalk arguments
flip (bool) – Flips keys (index) and value column (usually for joining)
- openplaces.io.transform.remap(df, recipe_id)#
Remap values in dataframe column using recipe table
- Parameters:
df (DataFrame or GeoDataFrame) – Data
recipe_id (str) – ID of recipe table that contains the remapping
- openplaces.io.transform.add_unique_suffix(s)#
Make string Series unique by appending unique integer suffices.
All duplicate occurrences are suffixed (
-1,-2, …), including the first one. Use make_index_unique when operating on a DataFrame index and the first (or largest) occurrence should keep the unsuffixed value.- Parameters:
s (pd.Series) – String Series containing duplicate entries
- openplaces.io.transform.make_index_unique(df: pandas.DataFrame, sort_by: str | None = None, ascending: bool = False, separator: str = '-', *, sort_duplicates_by_area: bool = False, area_crs: str = 'EPSG:6933') pandas.DataFrame#
Return a copy of a DataFrame / GeoDataFrame with a unique string index.
Duplicate index values are resolved so that the first occurrence keeps the original index value and later duplicates receive suffixes
-1,-2, … Sorting controls which occurrence counts as “first”.Unlike add_unique_suffix, which operates on a Series and suffixes every duplicate (including the first), this function preserves the unsuffixed value for the winning row.
- Parameters:
df (pd.DataFrame or gpd.GeoDataFrame) – Input frame whose index will be made unique.
sort_by (str, optional) – Column to sort the entire frame by before resolving duplicates.
ascending (bool) – Sort direction. Default
Falseso larger values sort first.separator (str) – String inserted between the original index value and the counter.
sort_duplicates_by_area (bool) – If True, and
dfis a GeoDataFrame, compute equal-area geometry area for rows with duplicated index values and sort within each group so the largest polygon keeps the unsuffixed index.area_crs (str) – Equal-area CRS used for area calculation. Default:
EPSG:6933.