io#

Input/output utilities

Submodules#

Functions#

`download`(from_url, to_path[, chunk_size, timeout, ...])	Download file from URL with progress bar.
`unzip`(in_path[, out_dir, members, verbose])	Extract files from a zip archive.
`to_parquet`(→ None)	Save dataframe to Parquet format.
`to_csv`(→ None)	Save dataframe as CSV file.
`to_gpkg`(→ None)	Save geodataframe as GeoPackage.
`to_kmz`(→ None)	Save geodataframe as KMZ file (zipped KML).
`save`(→ None)	Save dataframe with format auto-detected from file extension.
`save_parquet`(gdf, parquet_path[, simplified_geometry, ...])	Save parquet file (with geometries in joinable geoparquet file)
`read_parquet`(parquet_path[, geom, drop_join_id, ...])	Read parquet file from filesystem (with optional geometries).
`delete_image_caches`(→ pandas.DataFrame)	Delete location-specific image caches from the external directory.
`compress`(→ None)	Compress one or more files.
`to_drive`(filepath, directory[, remote, verbose])	Copy file to Google Drive
`share`(df, filepath[, drive_dir, delete_original, verbose])	Shortcut for saving, compressing, and uploading to Drive

Package Contents#

openplaces.io.download(from_url, to_path, chunk_size=8192, timeout=None, verify_ssl=True)#

Download file from URL with progress bar.

Parameters:

from_url (str) – Source URL
to_path (str or Path) – Target file path or directory If a directory is passed (.suffix == ‘’), filename is inferred from response headers or url.
chunk_size (int, default 8192) – Download chunk size in bytes
timeout (int, optional) – Request timeout in seconds (uses cfg.download_timeout if None)

Returns:

Path to downloaded file

Return type:

Path

Raises:

requests.RequestException – If download fails

openplaces.io.unzip(in_path, out_dir=None, members=None, verbose=True)#

Extract files from a zip archive.

Supports standard ZIP (deflate) and Deflate64 ZIP files. Deflate64 extraction requires 7z to be installed (see dev.py ensure_7zip()).

Parameters:

in_path (str or Path) – Path to input zip file
out_dir (str or Path, optional) – Output directory. If None, extracts to directory named after the zip file (without extension) in the same location. Example: ‘data.zip’ -> ‘data/’
members (list of str, optional) – Specific files to extract. If None, extracts all files. Note: ignored when falling back to 7z.
verbose – If True, might print warnings, e.g. when switching to 7z

Returns:

Path to output directory

Return type:

Path

Examples

>>> unzip('data/raw/parcels.zip')  # -> data/raw/parcels/
>>> unzip('data.zip', 'data/heap')  # -> data/heap/
>>> unzip('data.zip', members=['file1.txt', 'file2.csv'])

openplaces.io.to_parquet(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, *, file_metadata: dict[str, str] | None = None, **kwargs) → None#

Save dataframe to Parquet format.

Categorical columns are written as a string logical type (Parquet still dictionary-encodes them physically, so files stay compact) so GDAL/QGIS read stable string values rather than a per-file integer code->label mapping.

Parameters:

df (DataFrame or GeoDataFrame) – Data to save
filepath (str or Path) – Output parquet path (should end in .parquet)
file_metadata (dict of str to str, optional) – Key-value pairs written into the Parquet footer (file-level) metadata, merged with the metadata pandas/pyarrow already attaches. Read back via pyarrow.parquet.read_metadata() without scanning rows. Only supported for plain (non-geo) DataFrames; ignored for GeoDataFrames.
**kwargs – Additional arguments passed to to_parquet()

openplaces.io.to_csv(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, index: bool = False, **kwargs) → None#

Save dataframe as CSV file.

Automatically drops ‘geometry’ column if present.

Parameters:

df (DataFrame or GeoDataFrame) – DataFrame to save
filepath (str or Path) – Output CSV path
index (bool, default False) – Whether to write row index
**kwargs – Additional arguments passed to df.to_csv()

openplaces.io.to_gpkg(gdf: geopandas.GeoDataFrame, filepath: str | pathlib.Path, layer: str = None, **kwargs) → None#

Save geodataframe as GeoPackage.

Removes existing file before writing (GeoPackage format requirement).

Parameters:

gdf (GeoDataFrame) – Geodataframe to save
filepath (str or Path) – Output geopackage path
layer (str, optional) – Layer name within geopackage
**kwargs – Additional arguments passed to to_file()

openplaces.io.to_kmz(gdf: geopandas.GeoDataFrame, filepath: str | pathlib.Path) → None#

Save geodataframe as KMZ file (zipped KML).

Parameters:

gdf (GeoDataFrame) – Geodataframe to save
filepath (str or Path) – Output KMZ path

openplaces.io.save(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, **kwargs) → None#

Save dataframe with format auto-detected from file extension.

Supported formats: - .parquet: Parquet (or GeoParquet if GeoDataFrame with geometry) - .gpkg: GeoPackage (GeoDataFrame only) - .csv: CSV (geometry dropped if present) - .kmz: KMZ (GeoDataFrame only)

Parameters:

df (DataFrame or GeoDataFrame) – Data to save
filepath (str or Path) – Output path with extension
**kwargs – Additional arguments passed to format-specific save function

Examples

>>> save(gdf, 'data/core/parcels.parquet')
>>> save(df, 'data/out/results.csv', index=True)

openplaces.io.save_parquet(gdf, parquet_path, simplified_geometry=None, combined=False, file_metadata=None)#

Save parquet file (with geometries in joinable geoparquet file)

Parameters:

gdf (DataFrame or GeoDataFrame) – Data to save
parquet_path (str) – Filepath of Parquet file
simplified_geometry (GeoSeries or None) – When provided, a companion _geo_simplified.parquet sidecar is written alongside the standard _geo.parquet, containing only the join-id column and the simplified geometries. Intended for visualization use; readable via read_parquet(path, geom='simplified'). Ignored when combined is True.
combined (bool) – If True and gdf is a GeoDataFrame, write a single geoparquet file that includes all attribute columns and the geometry column together, with no _geo sidecar and no _join_id. Use this when downstream consumers expect a standard geoparquet rather than the split two-file layout.
file_metadata (dict of str to str, optional) – Key-value pairs written into the attribute parquet’s footer (file-level) metadata. Only applied to plain (non-geo) frames; see to_parquet.

openplaces.io.read_parquet(parquet_path, geom=False, drop_join_id=True, filters=None, bbox: tuple[float, float, float, float] | None = None, **kwargs)#

Read parquet file from filesystem (with optional geometries).

Parameters:

parquet_path (str) – Filepath of Parquet file
geom (bool or 'simplified') – If True, join full geometries from the _geo sidecar. If 'simplified', join simplified geometries from the _geo_simplified sidecar written by save_parquet. For a combined file (written with save_parquet(..., combined=True) — geometry already merged in, no sidecar), geom only controls whether the (always-present) geometry column is kept or dropped; 'simplified' is not supported, since no simplified sidecar exists.
drop_join_id (bool) – Drop column ‘_join_id’ if it exists.
filters (list of filters, optional) – Passed to pd.read_parquet for the attribute table. Also applied to the geo file as a join-id filter when bbox is not provided.
bbox (tuple of (minx, miny, maxx, maxy), optional) – Spatial bounding box filter in EPSG:4326. When provided and geom=True, exploits covering bbox columns written by write_covering_bbox=True for Parquet predicate pushdown on the geo file — bypasses the join-id filter.
**kwargs – Additional keyword arguments passed to pd.read_parquet() (e.g. columns).

openplaces.io.delete_image_caches(admin_ids: str | list | None = None, source: str | None = None, version: str | None = None, dry_run: bool = True) → pandas.DataFrame#

Delete location-specific image caches from the external directory.

Parameters:

admin_ids (str, AdminId, list, or None) – Admin units whose caches to delete; a coarser unit (e.g. a county) matches all caches of its children. None matches all locations.
source (str or None) – Restrict to one image source (e.g. ‘googlesatellite’).
version (str or None) – Restrict to one recipe version (e.g. ‘z20’).
dry_run (bool) – If True (default), only report what would be deleted. If False, remove each matched cache directory, including images and the image metadata parquet.

Returns:

The matched caches: admin_id, source, version, n_files, size_mb, path.

Return type:

pd.DataFrame

openplaces.io.compress(filepaths: str | pathlib.Path | list[str] | set[str], zip_filepath: str | None = None, delete_original: bool = False) → None#

Compress one or more files.

Parameters:

filepaths (str or list of str) – Single filepath or list of filepaths
zip_filepath (str, optional) – Output ZIP filepath. If None, derived from the first entry in filepaths.
delete_original (bool) – If True, deletes the original file(s) after compression.

openplaces.io.to_drive(filepath, directory, remote='budrive', verbose=True)#

Copy file to Google Drive

Uses rclone. Remote ‘drive’ must exist: https://rclone.org/drive/

Parameters:

filepath (str) – Path of file to copy
directory (str) – Drive folder to copy to
remote (str) – Name of the rclone remote to copy to
verbose (bool) – If True, print progress

openplaces.io.share(df, filepath, drive_dir=None, delete_original=True, verbose=True)#

Shortcut for saving, compressing, and uploading to Drive

File format is deduced from filepath extension.

Drive folder is deduced from filepath and assumed to be in the share data directory (openplaces.config.cfg.share_dir)

Parameters:

df (DataFrame or GeoDataFrame) – Dataset to be saved
filepath (pathlib.Path) – Filepath used for saving (and for the compressed ZIP file).
delete_original (bool) – If True, deletes the unzipped file after compression
verbose (bool) – If True, prints statements (‘Saving’, ‘compressing’, etc.)