overlay#
Spatial overlay operations on polygon datasets.
Depends on recipes, admin lookups, and DuckDB for fast parquet-based overlays.
Functions#
|
Add administrative unit IDs to GeoDataFrame using spatial joins |
Intersect two polygon datasets using DuckDB (parquet-streaming path). |
|
|
Compare overlay_polygons (geopandas) vs overlay_polygons_with_duckdb for IOU. |
|
Benchmark overlay_polygons vs overlay_polygons_with_duckdb across all |
Module Contents#
- openplaces.geo.overlay.overlay_admin_ids(gdf, admin_geometries=None, admin_level=2, admin_id=None, admin_recipe=None, include_overlays=False, timer=None)#
Add administrative unit IDs to GeoDataFrame using spatial joins
- Parameters:
gdf (GeoDataFrame) – GeoDataFrame to join admin IDs to
admin_geometries (gpd.GeoSeries) – GeoSeries of admin geometries with AdminID index. Pass this or admin_level
admin_level (int) – Administrative level for which administrative IDs are to be joined. Typically a lower level (larger number) than the level of admin_id.
admin_id (str) – Administrative unit of the GeoDataFrame. Determines which administrative units to consider.
admin_recipe (dict or str) – Recipe of the administrative unit dataset to be used. String identifier or resolved recipe (dictionary). If None, the default recipe for administrations is used.
include_overlays (bool) – If True, attempt a spatial polygon overlay for polygons for which the spatial join (centroids) returned no results.
timer (openplaces.timing.Timer or None) – Timer
- openplaces.geo.overlay.overlay_polygons_with_duckdb(layer1: pathlib.Path | geopandas.GeoDataFrame, layer2: pathlib.Path | geopandas.GeoDataFrame, columns: list[str] | None = None, geom: bool = False, iou: bool = False, suffixes: tuple[str, str] | None = None, how: str = 'intersection') pandas.DataFrame | geopandas.GeoDataFrame#
Intersect two polygon datasets using DuckDB (parquet-streaming path).
Each layer may be a GeoDataFrame or a path to an attribute parquet file saved with save_parquet(). When a GeoDataFrame is passed it is written to a temporary parquet file automatically. The corresponding _geo.parquet sidecar is imputed automatically for file-based layers.
- Parameters:
layer1 – First polygon layer — GeoDataFrame or path to attribute parquet file.
layer2 – Second polygon layer — GeoDataFrame or path to attribute parquet file.
columns – Columns to return from the attribute tables. Columns are auto-detected from both tables. If a column exists in both tables, suffixes must be provided.
geom – If True, return the clipped intersection geometry as a GeoDataFrame.
iou – If True, compute intersection area, union area, and intersection-over-union ratio. Areas are computed in EPSG:6933 (m²). Only meaningful for matched pairs; unmatched rows get NaN.
suffixes – Required when any requested column exists in both attribute tables, or when both tables share the same index name. Tuple of two strings, e.g. (‘_tiles’, ‘_admin’), appended to disambiguate column names, analogous to gpd.sjoin suffixes.
how ({'intersection', 'union', 'identity'}) –
Type of overlay operation:
’intersection’: only overlapping pairs (inner join)
’union’: all polygons from both sides; unmatched rows retain original geometry and get NaN for the missing index level
’identity’: all polygons from layer1; unmatched layer1 polygons retain original geometry and get NaN for the layer2 index level
- Returns:
MultiIndex of (index1, index2) detected from parquet metadata. Columns: those requested via columns, plus iou columns if iou=True, plus geometry if geom=True.
- Return type:
pd.DataFrame or gpd.GeoDataFrame
- Raises:
FileNotFoundError – If any of the expected parquet files are missing.
ValueError – If any requested column exists in both tables and suffixes is None, or if both tables share the same index name and suffixes is None, or if how is not one of the supported operations.
- openplaces.geo.overlay.benchmark_iou(gdf_left, gdf_right, suffixes=None, how='intersection')#
Compare overlay_polygons (geopandas) vs overlay_polygons_with_duckdb for IOU.
Runs both approaches and prints wall-clock time for each.
- Parameters:
gdf_left (GeoDataFrame)
gdf_right (GeoDataFrame)
suffixes (tuple[str, str] or None) – Required when both GeoDataFrames share the same index name. Passed to both functions unchanged.
how (str) – Overlay type — ‘intersection’, ‘union’, or ‘identity’.
- Returns:
result_mem (pd.DataFrame) – Result of overlay_polygons (geopandas).
result_disk (pd.DataFrame) – Result of overlay_polygons_with_duckdb.
- openplaces.geo.overlay.benchmark_overlay(gdf_left: geopandas.GeoDataFrame, gdf_right: geopandas.GeoDataFrame, suffixes: tuple[str, str] | None = None, how: str = 'intersection', n: int = 1)#
Benchmark overlay_polygons vs overlay_polygons_with_duckdb across all combinations of input_type × iou × geom.
- Parameters:
gdf_left (GeoDataFrame) – Representative input data (already in memory).
gdf_right (GeoDataFrame) – Representative input data (already in memory).
suffixes – Passed through to both functions.
how – Overlay type.
n – Number of repetitions per cell; median is reported.