Docker Compose Usage Guide

This guide explains how to use Docker Compose to run Databricks runtime containers locally.

Prerequisites

Docker Engine 20.10 or later
Docker Compose 2.0 or later
For GPU containers: NVIDIA Docker runtime

Quick Start

1. Generate Dockerfiles

First, generate the Dockerfiles for all runtimes:

poetry run dbx-container build --output-dir data

2. Start a Container

Use Docker Compose profiles to start specific containers:

# Start latest Python runtime (17.3 LTS)
docker compose --profile python up -d python-latest

# Start Python 15.4 LTS runtime
docker compose --profile python-15 up -d python-15-4

# Start latest GPU runtime (requires NVIDIA GPU)
docker compose --profile gpu up -d gpu-latest

# Start ML variant
docker compose --profile ml up -d python-ml-latest

3. Access the Container

# Execute commands in the container
docker compose exec python-latest bash

# Run Python
docker compose exec python-latest /databricks/python3/bin/python

# Run a script
docker compose exec python-latest /databricks/python3/bin/python /databricks/notebooks/my_script.py

4. Stop the Container

# Stop and remove
docker compose --profile python down

# Stop all
docker compose down

Available Profiles

Profile	Description	Container
`minimal`	Minimal Ubuntu with Java	minimal
`python`	All Python runtimes	python-latest, python-15-4
`python-15`	Python 15.4 LTS	python-15-4
`python-16`	Python 16.4 LTS	python-16-4
`python-17`	Python 17.3 LTS	python-17-3
`python-ml`	Python ML runtimes	python-ml-latest
`gpu`	GPU runtimes	gpu-latest
`gpu-ml`	GPU ML runtimes	gpu-ml-latest
`ml`	All ML runtimes	python-ml-latest, gpu-ml-latest
`latest`	Latest LTS runtimes	python-latest, gpu-latest
`standard`	SSH server support	standard

Examples

Running Python Code

Create a notebook or script in the notebooks/ directory:

# notebooks/hello.py
print("Hello from Databricks runtime!")

import sys
print(f"Python version: {sys.version}")

Run it:

docker compose --profile python up -d python-latest
docker compose exec python-latest /databricks/python3/bin/python /databricks/notebooks/hello.py

Using PySpark

docker compose --profile python up -d python-latest
docker compose exec python-latest bash

# Inside the container
/databricks/python3/bin/python
>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.appName("test").getOrCreate()
>>> df = spark.range(10)
>>> df.show()

GPU-Enabled Containers

For GPU containers, ensure you have:

NVIDIA GPU drivers installed
NVIDIA Container Toolkit installed
Docker configured to use the NVIDIA runtime

# Test GPU access
docker compose --profile gpu up -d gpu-latest
docker compose exec gpu-latest nvidia-smi

ML Workloads

ML containers include additional libraries:

docker compose --profile ml up -d python-ml-latest
docker compose exec python-ml-latest /databricks/python3/bin/python

# Inside Python
>>> import tensorflow as tf
>>> import torch
>>> import numpy as np

SSH Access

The standard container includes SSH server:

docker compose --profile standard up -d standard

# SSH is available on port 2222
ssh user@localhost -p 2222

Volume Mounts

By default, the following directories are mounted:

./notebooks → /databricks/notebooks - Your Python scripts and notebooks
./data → /databricks/data - Data files

You can modify these in docker-compose.yml or add additional mounts.

Building Custom Images

Extend a Base Image

Create a custom Dockerfile:

# Dockerfile.custom
FROM dbx-runtime:python-17.3-lts-ubuntu2404-py312

# Install additional packages
RUN /databricks/python3/bin/pip install \
    mlflow \
    great_expectations \
    dbt-core

# Copy your application
COPY ./app /app
WORKDIR /app

Add to docker-compose.yml:

services:
  custom-python:
    build:
      context: .
      dockerfile: Dockerfile.custom
    image: dbx-runtime:python-custom
    volumes:
      - ./notebooks:/databricks/notebooks
    command: tail -f /dev/null
    profiles:
      - custom

Build and run:

docker compose --profile custom build custom-python
docker compose --profile custom up -d custom-python

Environment Variables

Common environment variables:

Variable	Description	Default
`PYSPARK_PYTHON`	Python interpreter for PySpark	`/databricks/python3/bin/python3`
`NVIDIA_VISIBLE_DEVICES`	GPU devices to expose	`all`
`JAVA_HOME`	Java installation path	`/usr/lib/jvm/zulu17-ca-amd64`

Set in docker-compose.yml:

services:
  python-latest:
    environment:
      - PYSPARK_PYTHON=/databricks/python3/bin/python3
      - SPARK_HOME=/databricks/spark
      - MY_CUSTOM_VAR=value

Troubleshooting

Container won't start

Check logs:

docker compose --profile python logs python-latest

Permission issues

Some containers may need specific user permissions:

docker compose exec --user root python-latest chown -R $(id -u):$(id -g) /databricks/notebooks

Network issues

Check container networking:

docker compose exec python-latest ping -c 3 google.com

Build failures

Rebuild from scratch:

docker compose --profile python build --no-cache python-latest

Multiple Containers

Run multiple runtimes simultaneously:

# Start both Python and GPU containers
docker compose --profile python --profile gpu up -d

# List running containers
docker compose ps

# Access different containers
docker compose exec python-latest bash
docker compose exec gpu-latest bash

Cleanup

Remove containers and volumes:

# Stop and remove containers
docker compose down

# Remove volumes as well
docker compose down -v

# Remove all images
docker compose down --rmi all

Production Usage

For production deployments:

Use pre-built images from a registry instead of building locally
Configure resource limits
Set up health checks
Use secrets management
Configure logging drivers

Example production configuration:

services:
  python-prod:
    image: ghcr.io/twsl/dbx-runtime:python-17.3-lts-ubuntu2404-py312
    deploy:
      resources:
        limits:
          cpus: "4"
          memory: 8G
        reservations:
          cpus: "2"
          memory: 4G
    healthcheck:
      test: ["CMD", "python", "--version"]
      interval: 30s
      timeout: 10s
      retries: 3
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
    restart: unless-stopped