io#

Input/output utilities

Submodules#

Functions#

download(from_url, to_path[, chunk_size, timeout, ...])

Download file from URL with progress bar.

unzip(in_path[, out_dir, members, verbose])

Extract files from a zip archive.

to_parquet(→ None)

Save dataframe to Parquet format.

to_csv(→ None)

Save dataframe as CSV file.

to_gpkg(→ None)

Save geodataframe as GeoPackage.

to_kmz(→ None)

Save geodataframe as KMZ file (zipped KML).

save(→ None)

Save dataframe with format auto-detected from file extension.

save_parquet(gdf, parquet_path[, simplified_geometry, ...])

Save parquet file (with geometries in joinable geoparquet file)

read_parquet(parquet_path[, geom, drop_join_id, ...])

Read parquet file from filesystem (with optional geometries).

compress(→ None)

Compress one or more files.

to_drive(filepath, directory[, remote, verbose])

Copy file to Google Drive

share(df, filepath[, drive_dir, delete_original, verbose])

Shortcut for saving, compressing, and uploading to Drive

Package Contents#

openplaces.io.download(from_url, to_path, chunk_size=8192, timeout=None, verify_ssl=True)#

Download file from URL with progress bar.

Parameters:
  • from_url (str) – Source URL

  • to_path (str or Path) – Target file path or directory If a directory is passed (.suffix == ‘’), filename is inferred from response headers or url.

  • chunk_size (int, default 8192) – Download chunk size in bytes

  • timeout (int, optional) – Request timeout in seconds (uses cfg.download_timeout if None)

Returns:

Path to downloaded file

Return type:

Path

Raises:

requests.RequestException – If download fails

openplaces.io.unzip(in_path, out_dir=None, members=None, verbose=True)#

Extract files from a zip archive.

Supports standard ZIP (deflate) and Deflate64 ZIP files. Deflate64 extraction requires 7z to be installed (see dev.py ensure_7zip()).

Parameters:
  • in_path (str or Path) – Path to input zip file

  • out_dir (str or Path, optional) – Output directory. If None, extracts to directory named after the zip file (without extension) in the same location. Example: ‘data.zip’ -> ‘data/’

  • members (list of str, optional) – Specific files to extract. If None, extracts all files. Note: ignored when falling back to 7z.

  • verbose – If True, might print warnings, e.g. when switching to 7z

Returns:

Path to output directory

Return type:

Path

Examples

>>> unzip('data/raw/parcels.zip')  # -> data/raw/parcels/
>>> unzip('data.zip', 'data/heap')  # -> data/heap/
>>> unzip('data.zip', members=['file1.txt', 'file2.csv'])
openplaces.io.to_parquet(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, **kwargs) None#

Save dataframe to Parquet format.

Parameters:
  • df (DataFrame or GeoDataFrame) – Data to save

  • filepath (str or Path) – Output parquet path (should end in .parquet)

  • **kwargs – Additional arguments passed to to_parquet()

openplaces.io.to_csv(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, index: bool = False, **kwargs) None#

Save dataframe as CSV file.

Automatically drops ‘geometry’ column if present.

Parameters:
  • df (DataFrame or GeoDataFrame) – DataFrame to save

  • filepath (str or Path) – Output CSV path

  • index (bool, default False) – Whether to write row index

  • **kwargs – Additional arguments passed to df.to_csv()

openplaces.io.to_gpkg(gdf: geopandas.GeoDataFrame, filepath: str | pathlib.Path, layer: str = None, **kwargs) None#

Save geodataframe as GeoPackage.

Removes existing file before writing (GeoPackage format requirement).

Parameters:
  • gdf (GeoDataFrame) – Geodataframe to save

  • filepath (str or Path) – Output geopackage path

  • layer (str, optional) – Layer name within geopackage

  • **kwargs – Additional arguments passed to to_file()

openplaces.io.to_kmz(gdf: geopandas.GeoDataFrame, filepath: str | pathlib.Path) None#

Save geodataframe as KMZ file (zipped KML).

Parameters:
  • gdf (GeoDataFrame) – Geodataframe to save

  • filepath (str or Path) – Output KMZ path

openplaces.io.save(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, **kwargs) None#

Save dataframe with format auto-detected from file extension.

Supported formats: - .parquet: Parquet (or GeoParquet if GeoDataFrame with geometry) - .gpkg: GeoPackage (GeoDataFrame only) - .csv: CSV (geometry dropped if present) - .kmz: KMZ (GeoDataFrame only)

Parameters:
  • df (DataFrame or GeoDataFrame) – Data to save

  • filepath (str or Path) – Output path with extension

  • **kwargs – Additional arguments passed to format-specific save function

Examples

>>> save(gdf, 'data/core/parcels.parquet')
>>> save(df, 'data/out/results.csv', index=True)
openplaces.io.save_parquet(gdf, parquet_path, simplified_geometry=None, combined=False)#

Save parquet file (with geometries in joinable geoparquet file)

Parameters:
  • gdf (DataFrame or GeoDataFrame) – Data to save

  • parquet_path (str) – Filepath of Parquet file

  • simplified_geometry (GeoSeries or None) – When provided, a companion _geo_simplified.parquet sidecar is written alongside the standard _geo.parquet, containing only the join-id column and the simplified geometries. Intended for visualization use; readable via read_parquet(path, geom='simplified'). Ignored when combined is True.

  • combined (bool) – If True and gdf is a GeoDataFrame, write a single geoparquet file that includes all attribute columns and the geometry column together, with no _geo sidecar and no _join_id. Use this when downstream consumers expect a standard geoparquet rather than the split two-file layout.

openplaces.io.read_parquet(parquet_path, geom=False, drop_join_id=True, filters=None, bbox: tuple[float, float, float, float] | None = None, **kwargs)#

Read parquet file from filesystem (with optional geometries).

Parameters:
  • parquet_path (str) – Filepath of Parquet file

  • geom (bool or 'simplified') – If True, join full geometries from the _geo sidecar. If 'simplified', join simplified geometries from the _geo_simplified sidecar written by save_parquet.

  • drop_join_id (bool) – Drop column ‘_join_id’ if it exists.

  • filters (list of filters, optional) – Passed to pd.read_parquet for the attribute table. Also applied to the geo file as a join-id filter when bbox is not provided.

  • bbox (tuple of (minx, miny, maxx, maxy), optional) – Spatial bounding box filter in EPSG:4326. When provided and geom=True, exploits covering bbox columns written by write_covering_bbox=True for Parquet predicate pushdown on the geo file — bypasses the join-id filter.

  • **kwargs – Additional keyword arguments passed to pd.read_parquet() (e.g. columns).

openplaces.io.compress(filepaths: str | pathlib.Path | list[str] | set[str], zip_filepath: str | None = None, delete_original: bool = False) None#

Compress one or more files.

Parameters:
  • filepaths (str or list of str) – Single filepath or list of filepaths

  • zip_filepath (str, optional) – Output ZIP filepath. If None, derived from the first entry in filepaths.

  • delete_original (bool) – If True, deletes the original file(s) after compression.

openplaces.io.to_drive(filepath, directory, remote='budrive', verbose=True)#

Copy file to Google Drive

Uses rclone. Remote ‘drive’ must exist: https://rclone.org/drive/

Parameters:
  • filepath (str) – Path of file to copy

  • directory (str) – Drive folder to copy to

  • remote (str) – Name of the rclone remote to copy to

  • verbose (bool) – If True, print progress

openplaces.io.share(df, filepath, drive_dir=None, delete_original=True, verbose=True)#

Shortcut for saving, compressing, and uploading to Drive

File format is deduced from filepath extension.

Drive folder is deduced from filepath and assumed to be in the share data directory (openplaces.config.cfg.share_dir)

Parameters:
  • df (DataFrame or GeoDataFrame) – Dataset to be saved

  • filepath (pathlib.Path) – Filepath used for saving (and for the compressed ZIP file).

  • delete_original (bool) – If True, deletes the unzipped file after compression

  • verbose (bool) – If True, prints statements (‘Saving’, ‘compressing’, etc.)