io#
Input/output utilities
Submodules#
Functions#
|
Download file from URL with progress bar. |
|
Extract files from a zip archive. |
|
Save dataframe to Parquet format. |
|
Save dataframe as CSV file. |
|
Save geodataframe as GeoPackage. |
|
Save geodataframe as KMZ file (zipped KML). |
|
Save dataframe with format auto-detected from file extension. |
|
Save parquet file (with geometries in joinable geoparquet file) |
|
Read parquet file from filesystem (with optional geometries). |
|
Compress one or more files. |
|
Copy file to Google Drive |
|
Shortcut for saving, compressing, and uploading to Drive |
Package Contents#
- openplaces.io.download(from_url, to_path, chunk_size=8192, timeout=None, verify_ssl=True)#
Download file from URL with progress bar.
- Parameters:
from_url (str) – Source URL
to_path (str or Path) – Target file path or directory If a directory is passed (.suffix == ‘’), filename is inferred from response headers or url.
chunk_size (int, default 8192) – Download chunk size in bytes
timeout (int, optional) – Request timeout in seconds (uses cfg.download_timeout if None)
- Returns:
Path to downloaded file
- Return type:
Path
- Raises:
requests.RequestException – If download fails
- openplaces.io.unzip(in_path, out_dir=None, members=None, verbose=True)#
Extract files from a zip archive.
Supports standard ZIP (deflate) and Deflate64 ZIP files. Deflate64 extraction requires 7z to be installed (see dev.py ensure_7zip()).
- Parameters:
in_path (str or Path) – Path to input zip file
out_dir (str or Path, optional) – Output directory. If None, extracts to directory named after the zip file (without extension) in the same location. Example: ‘data.zip’ -> ‘data/’
members (list of str, optional) – Specific files to extract. If None, extracts all files. Note: ignored when falling back to 7z.
verbose – If True, might print warnings, e.g. when switching to 7z
- Returns:
Path to output directory
- Return type:
Path
Examples
>>> unzip('data/raw/parcels.zip') # -> data/raw/parcels/ >>> unzip('data.zip', 'data/heap') # -> data/heap/ >>> unzip('data.zip', members=['file1.txt', 'file2.csv'])
- openplaces.io.to_parquet(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, **kwargs) None#
Save dataframe to Parquet format.
- Parameters:
df (DataFrame or GeoDataFrame) – Data to save
filepath (str or Path) – Output parquet path (should end in .parquet)
**kwargs – Additional arguments passed to to_parquet()
- openplaces.io.to_csv(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, index: bool = False, **kwargs) None#
Save dataframe as CSV file.
Automatically drops ‘geometry’ column if present.
- Parameters:
df (DataFrame or GeoDataFrame) – DataFrame to save
filepath (str or Path) – Output CSV path
index (bool, default False) – Whether to write row index
**kwargs – Additional arguments passed to df.to_csv()
- openplaces.io.to_gpkg(gdf: geopandas.GeoDataFrame, filepath: str | pathlib.Path, layer: str = None, **kwargs) None#
Save geodataframe as GeoPackage.
Removes existing file before writing (GeoPackage format requirement).
- Parameters:
gdf (GeoDataFrame) – Geodataframe to save
filepath (str or Path) – Output geopackage path
layer (str, optional) – Layer name within geopackage
**kwargs – Additional arguments passed to to_file()
- openplaces.io.to_kmz(gdf: geopandas.GeoDataFrame, filepath: str | pathlib.Path) None#
Save geodataframe as KMZ file (zipped KML).
- Parameters:
gdf (GeoDataFrame) – Geodataframe to save
filepath (str or Path) – Output KMZ path
- openplaces.io.save(df: pandas.DataFrame | geopandas.GeoDataFrame, filepath: str | pathlib.Path, **kwargs) None#
Save dataframe with format auto-detected from file extension.
Supported formats: - .parquet: Parquet (or GeoParquet if GeoDataFrame with geometry) - .gpkg: GeoPackage (GeoDataFrame only) - .csv: CSV (geometry dropped if present) - .kmz: KMZ (GeoDataFrame only)
- Parameters:
df (DataFrame or GeoDataFrame) – Data to save
filepath (str or Path) – Output path with extension
**kwargs – Additional arguments passed to format-specific save function
Examples
>>> save(gdf, 'data/core/parcels.parquet') >>> save(df, 'data/out/results.csv', index=True)
- openplaces.io.save_parquet(gdf, parquet_path, simplified_geometry=None, combined=False)#
Save parquet file (with geometries in joinable geoparquet file)
- Parameters:
gdf (DataFrame or GeoDataFrame) – Data to save
parquet_path (str) – Filepath of Parquet file
simplified_geometry (GeoSeries or None) – When provided, a companion
_geo_simplified.parquetsidecar is written alongside the standard_geo.parquet, containing only the join-id column and the simplified geometries. Intended for visualization use; readable viaread_parquet(path, geom='simplified'). Ignored when combined is True.combined (bool) – If True and gdf is a GeoDataFrame, write a single geoparquet file that includes all attribute columns and the geometry column together, with no
_geosidecar and no_join_id. Use this when downstream consumers expect a standard geoparquet rather than the split two-file layout.
- openplaces.io.read_parquet(parquet_path, geom=False, drop_join_id=True, filters=None, bbox: tuple[float, float, float, float] | None = None, **kwargs)#
Read parquet file from filesystem (with optional geometries).
- Parameters:
parquet_path (str) – Filepath of Parquet file
geom (bool or 'simplified') – If True, join full geometries from the
_geosidecar. If'simplified', join simplified geometries from the_geo_simplifiedsidecar written bysave_parquet.drop_join_id (bool) – Drop column ‘_join_id’ if it exists.
filters (list of filters, optional) – Passed to pd.read_parquet for the attribute table. Also applied to the geo file as a join-id filter when bbox is not provided.
bbox (tuple of (minx, miny, maxx, maxy), optional) – Spatial bounding box filter in EPSG:4326. When provided and geom=True, exploits covering bbox columns written by write_covering_bbox=True for Parquet predicate pushdown on the geo file — bypasses the join-id filter.
**kwargs – Additional keyword arguments passed to pd.read_parquet() (e.g. columns).
- openplaces.io.compress(filepaths: str | pathlib.Path | list[str] | set[str], zip_filepath: str | None = None, delete_original: bool = False) None#
Compress one or more files.
- Parameters:
filepaths (str or list of str) – Single filepath or list of filepaths
zip_filepath (str, optional) – Output ZIP filepath. If None, derived from the first entry in filepaths.
delete_original (bool) – If True, deletes the original file(s) after compression.
- openplaces.io.to_drive(filepath, directory, remote='budrive', verbose=True)#
Copy file to Google Drive
Uses rclone. Remote ‘drive’ must exist: https://rclone.org/drive/
- Parameters:
filepath (str) – Path of file to copy
directory (str) – Drive folder to copy to
remote (str) – Name of the rclone remote to copy to
verbose (bool) – If True, print progress
Shortcut for saving, compressing, and uploading to Drive
File format is deduced from filepath extension.
Drive folder is deduced from filepath and assumed to be in the share data directory (openplaces.config.cfg.share_dir)
- Parameters:
df (DataFrame or GeoDataFrame) – Dataset to be saved
filepath (pathlib.Path) – Filepath used for saving (and for the compressed ZIP file).
delete_original (bool) – If True, deletes the unzipped file after compression
verbose (bool) – If True, prints statements (‘Saving’, ‘compressing’, etc.)