Environment setup

This notebook is a utility to setup your conda environments for local development. For this project we will use 4 conda environments,

  1. schiphol-snakemake: Snakemake and papermill to execute notebooks as scripts in a pipeline
  2. schiphol-py: environment with which to execute Python notebooks with datascience tools like pandas, sklearn, xgboost, etc.
  3. schiphol-r: environment with R packages for exploratory analyses and R time-series forecasting
  4. schiphol-tf: environment with Tensorflow and Tensorflow-probability separate from other packages to avoid conflicts

When using papermill CLI we can pass a --kernel argument that specifies a kernel to use for executing a target notebook. Together with conda we can make our environments available as a kernel, but this requires some repetitive setup.

This notebook describes how to install all conda environments and make them available as a kernel.

Conda environments

Each environment is created from a file under ./envs/.

We then add each kernel to a list of kernels recognized by jupyter, so that papermill can run notebooks with specified conda environments.

[17]:
from pathlib import Path
[16]:
env_file_dir = "./envs/"
conda_envs = ["schiphol-snakemake", "schiphol-py", "schiphol-r", "schiphol-tf"]

for conda_env in conda_envs:
    env_file = Path(env_file_dir, conda_env + ".yml")
    print(f"conda env create -f {env_file_dir}{conda_env}.yml")

print()

for conda_env in conda_envs:
    env_file = Path(env_file_dir, conda_env + ".yml")

    print(
          f"conda deactivate\n"
          f"conda activate {conda_env}\n"
          f'python -m ipykernel install --user --name {conda_env} --display-name "Python ({conda_env})"\n'
          f"conda deactivate\n"
    )

# check if all 4 environments are added to the kernels
!jupyter kernelspec list

conda env create -f ./envs/schiphol-snakemake.yml
conda env create -f ./envs/schiphol-py.yml
conda env create -f ./envs/schiphol-r.yml
conda env create -f ./envs/schiphol-tf.yml

conda deactivate
conda activate schiphol-snakemake
python -m ipykernel install --user --name schiphol-snakemake --display-name "Python (schiphol-snakemake)"
conda deactivate

conda deactivate
conda activate schiphol-py
python -m ipykernel install --user --name schiphol-py --display-name "Python (schiphol-py)"
conda deactivate

conda deactivate
conda activate schiphol-r
python -m ipykernel install --user --name schiphol-r --display-name "Python (schiphol-r)"
conda deactivate

conda deactivate
conda activate schiphol-tf
python -m ipykernel install --user --name schiphol-tf --display-name "Python (schiphol-tf)"
conda deactivate

Docker image

Build locally, but note that the build is time-consuming as we are installing 3 separate conda environments into the container.

docker build -t schiphol .

The Docker container will execute snakemake when you run the container, but for this you need to be authenticated for write access to the Google Cloud Storage where data is located. Read access is already public.

Assuming that you have a folder named keys/ in this project root directory, you must mount it alongside the rest of the project when you run the container. Because we are mounting the service-account key with write-access we can now execute Snakemake with the Docker container.

docker run -v {$pwd}:/project/ schiphol