Configure#

First-time setup#

Upon you first import from openplaces, you will be asked to configure your installation.

The first step is to define where your data will be stored, as well as who can read and edit it.

This notebook walks you through the configuration step and showcases functions for updating and removing it:

notebooks/01_setup/01_first_steps.ipynb

Open and run it after installing and activating your environment:

conda activate openplaces
cd notebooks
jupyter notebook

Directory structure#

openplaces uses different directories for different stages of the analytical pipeline: input data (external downloads, own raw data), scratch directories for intermediate data, canonical, analysis-ready data, output data, shared data, fitted models, and reports / publications.

This simplifies data sharing across different machines and cloud services and the setting of team and user-specific permissions.

Name

Default

Shared/User

Description

data_root

None

🌍 Shared

Root directory for data, models, reports.

If None, the project code directory is used as the data root.

core

data/core

πŸ‘€ User (multi-user)

Processed, analysis-ready data

external

data/external

🌍 Shared

Downloaded data from third party sources

raw

data/raw

🌍 Shared

Raw data from own data collection efforts

cache

data/cache

πŸ‘€ User (multi-user)

Intermediate files generated during processing

heap

data/cache/_heap

πŸ‘€ User (multi-user)

Freshly unzipped data, not yet with standard prefixes

logs

data/cache/_logs

πŸ‘€ User (multi-user)

Logs from script runs with timing and metadata

out

data/out

πŸ‘€ User (multi-user)

Output and results data

share

data/share

🌍 Shared

Shared data between users

models

models

πŸ‘€ User (multi-user)

Trained and serialized models, model predictions, or model summaries

reports

reports

πŸ‘€ User (multi-user)

Reports and figures

Note

In single-user mode, all directories are at the same level (no user subfolders). In multi-user mode, user-specific directories are in data/<username>/.

Credits to Cookiecutter Data Science (Carl Boettiger’s lab @ Berkeley) for inspiring this directory structure.

Single vs. multi-user mode#

The configuration script will ask you to choose between single vs. multi-user mode for your data directories.

Single-user mode#

Best when you’re the only user.

Directory structure:

data_root/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ cache/         # Intermediate files (reproducible, deletable)
β”‚   β”‚   β”œβ”€β”€ _heap/     # Freshly unzipped data
β”‚   β”‚   └── _logs/     # Logs from runs with timing and arguments
β”‚   β”œβ”€β”€ core/          # Processed, analysis-ready data
β”‚   β”œβ”€β”€ external/      # External data (downloaded)
β”‚   β”œβ”€β”€ out/           # Outputs and results data
β”‚   β”œβ”€β”€ raw/           # Raw data
β”‚   └── share/         # Shared data
β”œβ”€β”€ models/            # Models, trained and serialized, predictions
└── reports/           # Reports and figures
  • No user subfolders created

  • Minimal config file created (commented template)

  • Uses project defaults from openplaces.yaml

Multi-user mode#

Best for team projects where multiple people work on the same codebase.

Setup process:

  1. Choose multi-user mode (option b)

  2. Accept default folder name (from your system username) or enter custom name

  3. User-specific folders created for outputs

Directory structure:

data_root/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ external/              # 🌍 Shared
β”‚   β”œβ”€β”€ raw/                   # 🌍 Shared
β”‚   β”œβ”€β”€ share/                 # 🌍 Shared
β”‚   └── YourUsername/          # πŸ‘€ Yours
β”‚       β”œβ”€β”€ cache/             # πŸ‘€ Yours
β”‚       β”‚   β”œβ”€β”€ _heap/         # πŸ‘€ Yours
β”‚       β”‚   └── _logs/         # πŸ‘€ Yours
β”‚       β”œβ”€β”€ core/              # πŸ‘€ Yours
β”‚       └── out/               # πŸ‘€ Yours
β”œβ”€β”€ models/                    # 🌍 Shared
β”‚   └── YourUsername/          # πŸ‘€ Yours
└── reports/                   # 🌍 Shared
    └── YourUsername/          # πŸ‘€ Yours
  • User subfolders for: cache, heap, logs, core, out

  • Shared folders for: external, raw, share

  • User subfolders in models, reports

Location of configuration files#

openplaces uses a hierarchical configuration system to customize data directories and settings.

Configuration files are used in priority order: user > project > defaults.

  1. User configuration (highest priority)

    • A user configuration file is created interactively upon the first time a new user runs import openplaces.

      Its location depends on your operating system:

      %APPDATA%\openplaces\config.yaml

      ~/Library/Application Support/openplaces/config.yaml

      ~/.config/openplaces/config.yaml

    • It contains user-specific overrides to the project configuration and is not committed to version control (git).

  2. Project configuration (default values)

    • Location: openplaces.yaml (in root directory of repository).

    • Project-wide defaults committed to version control.

    • Shared by all users of an installation.

  3. Built-in defaults (fallback)

    • Hardcoded in config.py.