Configure#
First-time setup#
Upon you first import from openplaces, you will be asked to configure your installation.
The first step is to define where your data will be stored, as well as who can read and edit it.
This notebook walks you through the configuration step and showcases functions for updating and removing it:
notebooks/01_setup/01_first_steps.ipynb
Open and run it after installing and activating your environment:
conda activate openplaces
cd notebooks
jupyter notebook
Directory structure#
openplaces uses different directories for different stages of the analytical pipeline: input data (external downloads, own raw data), scratch directories for intermediate data, canonical, analysis-ready data, output data, shared data, fitted models, and reports / publications.
This simplifies data sharing across different machines and cloud services and the setting of team and user-specific permissions.
Name |
Default |
Shared/User |
Description |
|---|---|---|---|
|
|
π Shared |
Root directory for data, models, reports. If |
|
data/core |
π€ User (multi-user) |
Processed, analysis-ready data |
|
data/external |
π Shared |
Downloaded data from third party sources |
|
data/raw |
π Shared |
Raw data from own data collection efforts |
|
data/cache |
π€ User (multi-user) |
Intermediate files generated during processing |
|
data/cache/_heap |
π€ User (multi-user) |
Freshly unzipped data, not yet with standard prefixes |
|
data/cache/_logs |
π€ User (multi-user) |
Logs from script runs with timing and metadata |
|
data/out |
π€ User (multi-user) |
Output and results data |
|
data/share |
π Shared |
Shared data between users |
|
models |
π€ User (multi-user) |
Trained and serialized models, model predictions, or model summaries |
|
reports |
π€ User (multi-user) |
Reports and figures |
Note
In single-user mode, all directories are at the same level (no user subfolders).
In multi-user mode, user-specific directories are in data/<username>/.
Credits to Cookiecutter Data Science (Carl Boettigerβs lab @ Berkeley) for inspiring this directory structure.
Single vs. multi-user mode#
The configuration script will ask you to choose between single vs. multi-user mode for your data directories.
Single-user mode#
Best when youβre the only user.
Directory structure:
data_root/
βββ data/
β βββ cache/ # Intermediate files (reproducible, deletable)
β β βββ _heap/ # Freshly unzipped data
β β βββ _logs/ # Logs from runs with timing and arguments
β βββ core/ # Processed, analysis-ready data
β βββ external/ # External data (downloaded)
β βββ out/ # Outputs and results data
β βββ raw/ # Raw data
β βββ share/ # Shared data
βββ models/ # Models, trained and serialized, predictions
βββ reports/ # Reports and figures
No user subfolders created
Minimal config file created (commented template)
Uses project defaults from
openplaces.yaml
Multi-user mode#
Best for team projects where multiple people work on the same codebase.
Setup process:
Choose multi-user mode (option
b)Accept default folder name (from your system username) or enter custom name
User-specific folders created for outputs
Directory structure:
data_root/
βββ data/
β βββ external/ # π Shared
β βββ raw/ # π Shared
β βββ share/ # π Shared
β βββ YourUsername/ # π€ Yours
β βββ cache/ # π€ Yours
β β βββ _heap/ # π€ Yours
β β βββ _logs/ # π€ Yours
β βββ core/ # π€ Yours
β βββ out/ # π€ Yours
βββ models/ # π Shared
β βββ YourUsername/ # π€ Yours
βββ reports/ # π Shared
βββ YourUsername/ # π€ Yours
User subfolders for:
cache,heap,logs,core,outShared folders for:
external,raw,shareUser subfolders in
models,reports
Location of configuration files#
openplaces uses a hierarchical configuration system to customize data directories and settings.
Configuration files are used in priority order: user > project > defaults.
User configuration (highest priority)
A user configuration file is created interactively upon the first time a new user runs
import openplaces.Its location depends on your operating system:
%APPDATA%\openplaces\config.yaml~/Library/Application Support/openplaces/config.yaml~/.config/openplaces/config.yamlIt contains user-specific overrides to the project configuration and is not committed to version control (
git).
Project configuration (default values)
Location:
openplaces.yaml(in root directory of repository).Project-wide defaults committed to version control.
Shared by all users of an installation.
Built-in defaults (fallback)
Hardcoded in
config.py.