Configure#

First-time setup#

Upon your first import from openplaces, you will be asked to configure your installation.

The first step is to define where your data will be stored, as well as who can read and edit it.

This notebook walks you through the configuration step and showcases functions for updating and removing it:

notebooks/01_setup/01_first_steps.ipynb

Open and run it after installing and activating your environment:

conda activate openplaces
cd notebooks
jupyter notebook

Directory structure#

openplaces uses different directories for different stages of the analytical pipeline: input data (external downloads, own raw data), scratch directories for intermediate data, canonical, analysis-ready data, output data, shared data, fitted models, and reports / publications.

This simplifies data sharing across different machines and cloud services, and makes it easier to configure team- and user-specific permissions.

Name	Default	Shared/User	Description
`data_root`	`None`	🌍 Shared	Root directory for data, models, reports. If `None`, the project code directory is used as the data root.
`core`	`data/core`	👤 User (multi-user)	Processed, analysis-ready data
`external`	`data/external`	🌍 Shared	Downloaded data from third-party sources
`raw`	`data/raw`	🌍 Shared	Raw data from your own data collection efforts
`cache`	`data/cache`	👤 User (multi-user)	Intermediate files generated during processing
`heap`	`data/cache/_heap`	👤 User (multi-user)	Freshly unzipped data, not yet with standard prefixes
`logs`	`data/cache/_logs`	👤 User (multi-user)	Logs from script runs with timing and metadata
`out`	`data/out`	👤 User (multi-user)	Output and results data
`share`	`data/share`	🌍 Shared	Shared data between users
`models`	`models`	👤 User (multi-user)	Trained and serialized models, model predictions, or model summaries
`reports`	`reports`	👤 User (multi-user)	Reports and figures

Note

In single-user mode, all directories are at the same level (no user subfolders). In multi-user mode, user-specific directories are in data/<username>/.

Credits to Cookiecutter Data Science (Carl Boettiger’s lab @ Berkeley) for inspiring this directory structure.

Single vs. multi-user mode#

The configuration script will ask you to choose between single-user and multi-user mode for your data directories.

Single-user mode#

Best when you’re the only user.

Directory structure:

data_root/
├── data/
│   ├── cache/         # Intermediate files (reproducible, deletable)
│   │   ├── _heap/     # Freshly unzipped data
│   │   └── _logs/     # Logs from runs with timing and arguments
│   ├── core/          # Processed, analysis-ready data
│   ├── external/      # External data (downloaded)
│   ├── out/           # Outputs and results data
│   ├── raw/           # Raw data
│   └── share/         # Shared data
├── models/            # Models, trained and serialized, predictions
└── reports/           # Reports and figures

No user subfolders created
Minimal config file created (commented template)
Uses project defaults from openplaces.yaml

Multi-user mode#

Best for team projects where multiple people work on the same codebase.

Setup process:

Choose multi-user mode (option b)
Accept default folder name (from your system username) or enter custom name
User-specific folders created for outputs

Directory structure:

data_root/
├── data/
│   ├── external/              # 🌍 Shared
│   ├── raw/                   # 🌍 Shared
│   ├── share/                 # 🌍 Shared
│   └── YourUsername/          # 👤 Yours
│       ├── cache/             # 👤 Yours
│       │   ├── _heap/         # 👤 Yours
│       │   └── _logs/         # 👤 Yours
│       ├── core/              # 👤 Yours
│       └── out/               # 👤 Yours
├── models/                    # 🌍 Shared
│   └── YourUsername/          # 👤 Yours
└── reports/                   # 🌍 Shared
    └── YourUsername/          # 👤 Yours

User subfolders for: cache, heap, logs, core, out
Shared folders for: external, raw, share
User subfolders in models, reports

Location of configuration files#

openplaces uses a hierarchical configuration system to customize data directories and settings.

Configuration files are used in priority order: user > project > defaults.

User configuration (highest priority)
- A user configuration file is created interactively the first time a new user runs import openplaces.
  
  Its location depends on your operating system:
  
  Windows
  
  %APPDATA%\openplaces\config.yaml
  
  macOS
  
  ~/Library/Application Support/openplaces/config.yaml
  
  Linux
  
  ~/.config/openplaces/config.yaml
- It contains user-specific overrides to the project configuration and is not committed to version control (git).
Project configuration (default values)
- Location: openplaces.yaml (in root directory of repository).
- Project-wide defaults committed to version control.
- Shared by all users of an installation.
Built-in defaults (fallback)
- Hardcoded in config.py.