polygon#

polygon.py

Core geometry and polygon operations on GeoDataFrames and Shapely geometries. No recipe, admin, or heavy I/O dependencies.

Functions#

fix_polygons(gdf)

Fix invalid geometries in a GeoDataFrame using make_valid.

has_geometry(gdf)

Get boolean Series identifying entries with valid geometries

clean_polygons(gdf)

Return GeoDataFrame with only clean and valid polygons

get_areas(gdf[, unit, crs])

Compute areas of polygons in a GeoDataFrame

crs_is_mea(crs)

Check if projection is an equal-area projection and uses meters.

get_lat_long_centroids(gdf[, crs, geom])

Get centroids (lat, long, geometry) in geographic projection.

get_pois(d[, how, precision_ratio, prec_min, crs, ...])

Get the Poles of Inaccessibility (PoI) for a polygon geodataframe

get_poi(geom[, precision_ratio, prec_min])

Get Pole of Inaccessibility (PoI) for a polygon geometry

get_poi_ortho(poly[, precision_ratio, prec_min, geom, ...])

Get POI with local ortho projection

get_polygon_xy(geom)

Get a list of x/y coordinates from a shapely Polygon

add_geometry_derivatives(gdf, timer, **kwargs)

Add standardized geometry derivatives to the geodataframe

points_from_coords(→ geopandas.GeoDataFrame)

Convert a DataFrame with coordinate columns to a GeoDataFrame of points.

get_simplified_geometries(gdf, tolerance)

Returns a GeoDataFrame with simplified polygon geometries

get_proj4(proj[, lat, lon, ellps])

Get proj4 string for a projection from lat/long.

find_overlaps(→ pandas.DataFrame)

Return pairs of row indices whose polygons overlap by more than a sliver.

resolve_overlapping_polygons(→ geopandas.GeoDataFrame)

Resolve substantially overlapping polygon pairs in a GeoDataFrame.

overlay_polygons(→ pandas.DataFrame | gpd.GeoDataFrame)

Intersect two polygon datasets in memory using geopandas.

Module Contents#

openplaces.geo.polygon.fix_polygons(gdf)#

Fix invalid geometries in a GeoDataFrame using make_valid.

Parameters:

gdf (GeoDataFrame) – GeoDataFrame that may have invalid geometries.

openplaces.geo.polygon.has_geometry(gdf)#

Get boolean Series identifying entries with valid geometries

Returns True for entries with non-empty and non-null geometries.

Parameters:

gdf (GeoDataFrame or GeoSeries) – Input Geodataframe or Geoseries

openplaces.geo.polygon.clean_polygons(gdf)#

Return GeoDataFrame with only clean and valid polygons

Attempts to fix (zero-buffer) invalid polygons and drops empty polygons (and those with unfixable errors).

openplaces.geo.polygon.get_areas(gdf, unit='ha', crs='epsg:6933')#

Compute areas of polygons in a GeoDataFrame

Parameters:
  • gdf (GeoDataFrame) – Geodataframe with polygons

  • unit (str) – Area unit. Currently interpreted: ‘m2’, ‘ha’, ‘km2’, ‘ac’, ‘ft2’, ‘sqft’

  • crs (coordinate reference system) – Coordinate system in which computation takes places Has to be equal-area and use meters as its unit.

openplaces.geo.polygon.crs_is_mea(crs)#

Check if projection is an equal-area projection and uses meters.

Detects Albers Equal Area and Cylindric Equal Area projections.

Parameters:

crs (pyproj CRS) – Coordinate Reference System

openplaces.geo.polygon.get_lat_long_centroids(gdf, crs='epsg:4326', geom=False)#

Get centroids (lat, long, geometry) in geographic projection.

Parameters:
  • gdf (GeoDataFrame) – Geodataframe

  • crs (projection) – Projection in which centroids will be computed. Should be geographic, as ‘lat’ and ‘long’ columns are returned.

  • geom (bool) – If True, returns GeoDataFrame with point geometries in same projection as gdf.

openplaces.geo.polygon.get_pois(d, how='points', precision_ratio=0.001, prec_min=0.5, crs='epsg:3395', orthogonal=False)#

Get the Poles of Inaccessibility (PoI) for a polygon geodataframe

Parameters:
  • d (GeoDataFrame or GeoSeries) – Geodataframe or GeoSeries containing polygons

  • how (str) – Determines how POIs will be returned. Options include: ‘tuples’: Series of tuples: (x, y, radius) ‘dataframe’: DataFrame with columns: (‘x_poi’, ‘y_poi’, ‘r_poi’) ‘points’: Geodataframe with points ‘points_only’: Geoseries of points ‘circles’: Geodataframe with circles / ellipses ‘circles_only’: Geoseries of circles / ellipses

  • precision_ratio (float) – Precision ratio used to define the precisions for: 1. the polygon simplification algorithm 2. the algorithm which finds the largest inscribed circle The precisions for both algorithms will be computed as: precision_ratio * square root of polygon area

  • prec_min (float) – Minimum tolerance for both algorithms, defined in CRS units.

  • crs (CRS) – Coordinate reference system (CRS) for computation of PoIs. This argument will be ignored if orthogonal is set to True. Mercator (‘epsg:3395’) is good for labels (more weight on width)

  • orthogonal (bool) – If True, uses orthogonal projection around centroid to find PoI. Slower, but closer to real POI than using any single projection.

openplaces.geo.polygon.get_poi(geom, precision_ratio=0.001, prec_min=0.5)#

Get Pole of Inaccessibility (PoI) for a polygon geometry

Returns the PoI as a pandas Series of (x, y, radius).

Parameters:
  • geom (Polygon or MultiPolygon) – Polygon for which largest inscribed circle is to be found

  • precision_ratio (float) – Precision ratio used to define the precisions for: 1. the polygon simplification algorithm 2. the algorithm which finds the largest inscribed circle The precisions for both algorithms will be computed as: precision_ratio * square root of polygon area

  • prec_min (float) – Minimum tolerance for both algorithms, defined in CRS units.

Notes

The precision has a major influence on performance. For geohashing, a precision_ratio of 0.05 appears to strike a good balance between uniqueness (0.1 is already unique), computation speed (0.01 is much slower) and correctness (0.1 can be quite off in some locations).

No PoI will be computed for polygons whose area is smaller than the square of the polygon-specific tolerance. Function will return (None, None, None).

openplaces.geo.polygon.get_poi_ortho(poly, precision_ratio=0.001, prec_min=0.5, geom=None, crs_orig='epsg:4326')#

Get POI with local ortho projection

Parameters:
  • poly (Polygon) – Polygon geometry

  • precision_ratio (float) – Precision ratio used to define the precisions for: 1. the polygon simplification algorithm 2. the algorithm which finds the largest inscribed circle The precisions for both algorithms will be computed as: precision_ratio * square root of polygon area

  • prec_min (float) – Minimum tolerance for both algorithms, defined in CRS units.

  • geom (str) – If ‘point’, adds point geometry If ‘circle’, adds circle geometry

  • crs_orig (str) – Original projection (of Polygon). Defaults to WGS84

openplaces.geo.polygon.get_polygon_xy(geom)#

Get a list of x/y coordinates from a shapely Polygon

Returns a list of lists, one for each ring (exterior or interiors), each containing lists of points (x, y).

MultiPolygons are not accepted.

Parameters:

geom (Polygon) – Polygon

openplaces.geo.polygon.add_geometry_derivatives(gdf, timer, **kwargs)#

Add standardized geometry derivatives to the geodataframe

Parameters:
  • gdf (geopandas.GeoDataFrame) – GeoDataFrame

  • timer (openplaces.timing.Timer) – Timer for data processing

  • kwargs (dict) – Dictionary of arguments will be assumed as coming from an openplaces.recipes.recipe

openplaces.geo.polygon.points_from_coords(df: pandas.DataFrame, x: str = 'long', y: str = 'lat', crs='epsg:4326', keep_columns: bool = False) geopandas.GeoDataFrame#

Convert a DataFrame with coordinate columns to a GeoDataFrame of points.

Parameters:
  • df – DataFrame with x and y coordinate columns.

  • x – Name of the x (longitude) column.

  • y – Name of the y (latitude) column.

  • crs – Coordinate reference system of the x/y coordinates.

  • keep_columns – If False (default), drop the x and y columns after conversion.

openplaces.geo.polygon.get_simplified_geometries(gdf, tolerance)#

Returns a GeoDataFrame with simplified polygon geometries

Uses .simplify_coverage() from geopandas to preserve topology

Parameters:
  • gdf (geopandas.GeoDataFrame) – GeoDataFrame

  • tolerance (float) – Simplification tolerance in CRS units

openplaces.geo.polygon.get_proj4(proj, lat=0, lon=0, ellps='WGS84')#

Get proj4 string for a projection from lat/long.

Parameters:
  • proj (str) – Projection type. Currently: ‘ortho’ or ‘nsper’.

  • lat (numeric) – Latitude

  • lon (numeric) – Longitude

  • ellps (str) – Ellipsoid

openplaces.geo.polygon.find_overlaps(gdf: geopandas.GeoDataFrame, min_overlap_m2: float = 1.0, area_crs: str = 'EPSG:6933', iou: bool = False) pandas.DataFrame#

Return pairs of row indices whose polygons overlap by more than a sliver.

Uses an STRtree spatial index for fast candidate detection, then computes exact intersection areas with vectorised Shapely only for candidate pairs.

Slivers (shared edges, floating-point artefacts) are excluded via min_overlap_m2. Both overlapping and fully-contained pairs are detected (i.e., the test is intersection area > threshold, not shapely ‘overlaps’).

Parameters:
  • gdf (GeoDataFrame) – Input GeoDataFrame with polygon geometries.

  • min_overlap_m2 (float) – Minimum intersection area in m² to count as a real overlap. Default 1 m² filters boundary slivers.

  • area_crs (str) – Equal-area CRS for area computation. Default: EPSG:6933.

  • iou (bool) – If True, also return area_left_m2, area_right_m2, iou (intersection-over-union), and overlap_ratio (overlap as a fraction of the smaller polygon’s area) columns.

Returns:

One row per overlapping pair with columns {index_name}_left, {index_name}_right, and overlap_m2. If iou=True, also includes area_left_m2, area_right_m2, iou, and overlap_ratio. Returns an empty DataFrame if no overlaps exceed the threshold.

Return type:

pd.DataFrame

openplaces.geo.polygon.resolve_overlapping_polygons(df: geopandas.GeoDataFrame, keep=None, overlap_ratio_threshold: float = 0.5, iou_threshold: float | None = None, compare_cols: list | None = None, snippet_cols: list | None = None) geopandas.GeoDataFrame#

Resolve substantially overlapping polygon pairs in a GeoDataFrame.

For each overlapping pair, non-ID and non-geometry columns are compared:

  • If identical: the second polygon is dropped (likely a data duplicate).

  • If different: behaviour is controlled by keep:

    • None (default): warn and keep both polygons.

    • True: keep both polygons silently (suppresses warning).

    • False: drop the smaller polygon of each pair.

    • ‘fewest_nulls’: drop the polygon with more null values (None, NaN, ‘’) per pair; fall back to area if tied.

    • {‘prefer_higher’: ‘<column>’}: drop the polygon with the lower value in the named column per pair; fall back to area if tied. Works with ordered categoricals, numerics, or any comparable type.

Parameters:
  • df (GeoDataFrame) – Input GeoDataFrame with polygon geometries.

  • keep (bool or str or dict) – How to resolve overlapping pairs with differing attributes. See above.

  • overlap_ratio_threshold (float or None) – A pair counts as a substantial overlap when the intersection area is at least this fraction of either polygon’s area (i.e. overlap / min(area_left, area_right) >= threshold). This ensures that a small polygon largely covered by a larger one is flagged as an overlap problem. Default 0.5. Set to None to disable.

  • iou_threshold (float or None) – Minimum intersection-over-union to treat two polygons as overlapping (i.e. overlap / (area_left + area_right - overlap) >= threshold). IoU is symmetric and size-agnostic, making it well suited for establishing identity between two polygon datasets (e.g. matching a predicted footprint to a reference). Not applied by default (None).

  • compare_cols (list of str, optional) – Columns used to detect identical vs differing pairs. If None, all columns except geometry and columns containing ‘_id’ are used.

  • snippet_cols (list of str, optional) – Columns shown in the warning data snippet (first 5). If None, the first 5 of compare_cols are used.

Returns:

DataFrame with overlapping duplicates removed (when applicable).

Return type:

GeoDataFrame

openplaces.geo.polygon.overlay_polygons(layer1, layer2, columns: list[str] | None = None, geom: bool = False, iou: bool = False, suffixes: tuple[str, str] | None = None, how: str = 'intersection') pandas.DataFrame | gpd.GeoDataFrame#

Intersect two polygon datasets in memory using geopandas.

Parameters:
  • layer1 – GeoDataFrame or path to an attribute parquet file (read with openplaces.io.read_parquet).

  • layer2 – GeoDataFrame or path to an attribute parquet file (read with openplaces.io.read_parquet).

  • columns – Extra columns to carry from the attribute tables into the result.

  • geom – If True, return intersection geometry.

  • iou – If True, compute intersection-over-union. Areas are in m² (EPSG:6933). Unmatched/leftover rows get NaN.

  • suffixes – Required when both tables share the same index name, or when a requested column exists in both tables.

  • how ({'intersection', 'union', 'identity'}) – Overlay type.