Skip to content

Technical Implementation

This document provides technical details about how DBX-Patch works internally.

The Timing Problem

Databricks loads sys_path_init and WsfsImportHook during Python startup, before any notebook code runs. This makes manual patching ineffective.

Without sitecustomize.py - FAILS:

1. Python starts
2. sys_path_init runs (skips .pth files)
3. WsfsImportHook installs (blocks editable imports)
4. Notebook runs
5. patch_dbx() ❌ Too late - import system already initialized!

With sitecustomize.py - WORKS:

1. Python starts
2. sitecustomize.py runs
3. patch_dbx() ✅ Perfect timing!
4. sys_path_init runs (already patched)
5. WsfsImportHook installs (already patched)
6. Notebook runs (everything works)

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    DBX-Patch Architecture                    │
└─────────────────────────────────────────────────────────────┘

Python Startup (with sitecustomize.py)
     │
     ├─► sitecustomize.py
     │      └─► patch_dbx()
     │             ├─► Patch sys_path_init
     │             ├─► Process .pth files
     │             ├─► Patch WsfsImportHook
     │             ├─► Patch PythonPathHook
     │             └─► Patch AutoreloadDiscoverabilityHook
     │
     ├─► sys_path_init (Databricks) - PATCHED
     │      └─ Auto-processes .pth files on sys.path updates
     │
     ├─► WsfsImportHook (Databricks) - PATCHED
     │      └─ Allows imports from editable paths
     │
     ├─► PythonPathHook (Databricks) - PATCHED
     │      └─ Preserves editable paths during updates
     │
     └─► AutoreloadDiscoverabilityHook (Databricks) - PATCHED
            └─ Allowlist includes editable paths

User Notebook
     │
     ├─► %pip install -e /path/to/package
     │
     └─► import package  ✅ Works!

Core Components

1. install_sitecustomize.py

Purpose: Installs auto-patching mechanism during Python startup

Why This Matters: Python's sitecustomize.py runs during interpreter initialization, BEFORE any Databricks code. This is the ONLY way to patch the import system early enough.

Key Functions:

  • install_sitecustomize(verbose=True, force=False, restart_python=True) - Creates sitecustomize.py in site-packages
  • uninstall_sitecustomize(verbose=True) - Removes auto-patching
  • check_sitecustomize_status(verbose=True) - Verifies installation
  • get_site_packages_path() - Locates writable site-packages directory
  • get_sitecustomize_content() - Generates the sitecustomize.py file content

How it works:

def install_sitecustomize(verbose=True, force=False, restart_python=True):
    # 1. Locate writable site-packages directory
    site_packages = get_site_packages_path()

    # 2. Check if already exists
    sitecustomize_path = site_packages / "sitecustomize.py"
    if sitecustomize_path.exists() and not force:
        # Already installed
        return False

    # 3. Write auto-patch code
    content = get_sitecustomize_content()
    sitecustomize_path.write_text(content)

    # 4. Restart Python to activate (in Databricks)
    if restart_python:
        try:
            from dbutils import DBUtils
            dbutils = DBUtils()
            dbutils.library.restartPython()
        except:
            # Not in Databricks - manual restart needed
            pass

    return True

Generated sitecustomize.py:

"""Auto-apply dbx-patch on Python startup.

This file is automatically loaded by Python during interpreter initialization.
It applies all dbx-patch fixes BEFORE sys_path_init and import hooks are loaded.
"""

import sys

def _apply_dbx_patch():
    """Apply dbx-patch fixes silently during startup."""
    try:
        from dbx_patch import patch_dbx
        patch_dbx(force_refresh=False)
    except ImportError:
        pass  # dbx-patch not installed
    except Exception as e:
        print(f"Warning: dbx-patch auto-apply failed: {e}", file=sys.stderr)

# Apply patches immediately on import
_apply_dbx_patch()

2. patch_dbx.py

Purpose: Main entry point for applying all patches

Key Functions:

  • patch_dbx(force_refresh=False) - Applies all fixes
  • verify_editable_installs(verbose=True) - Verification
  • check_patch_status(verbose=True) - Status checking
  • remove_all_patches(verbose=True) - Cleanup/removal

Workflow:

def patch_dbx(force_refresh=False):
    from dbx_patch.patches.sys_path_init_patch import SysPathInitPatch
    from dbx_patch.patches.wsfs_import_hook_patch import WsfsImportHookPatch
    from dbx_patch.patches.python_path_hook_patch import PythonPathHookPatch
    from dbx_patch.patches.autoreload_hook_patch import AutoreloadHookPatch

    results = {}

    # Step 1: Patch sys_path_init to auto-process .pth files
    results['sys_path_init'] = SysPathInitPatch().patch()

    # Step 2: Process .pth files immediately
    results['pth_processing'] = process_all_pth_files(
        force=force_refresh,
        verbose=verbose
    )

    # Step 3: Patch WsfsImportHook to allow editable imports
    results['wsfs_hook'] = WsfsImportHookPatch().patch()

    # Step 4: Patch PythonPathHook to preserve paths
    results['path_hook'] = PythonPathHookPatch().patch()

    # Step 5: Patch AutoreloadDiscoverabilityHook to allow editable imports
    results['autoreload_hook'] = AutoreloadHookPatch().patch()

    return results

Why This Order Matters:

  1. sys_path_init first - Ensures future sys.path updates auto-process .pth files
  2. Process .pth files - Adds editable paths to sys.path immediately
  3. WsfsImportHook - Allows imports from those paths
  4. PythonPathHook - Prevents paths from being lost during notebook/directory changes
  5. AutoreloadDiscoverabilityHook - Allows imports at the builtins.import level

3. pth_processor.py

Purpose: Processes .pth files to extract editable install paths

Features:

  • Detects .egg-link files (legacy setuptools)
  • Detects __editable_*.pth files (PEP 660)
  • Processes standard .pth files with directory paths
  • Adds paths to sys.path

Key Functions:

  • process_all_pth_files(force=False, verbose=True) - Main processing function
  • get_editable_install_paths() - Get paths without modifying sys.path
  • get_site_packages_dirs() - Locate all site-packages directories
  • find_pth_files(site_packages_dir) - Find .pth files in a directory
  • process_pth_file(pth_file_path) - Extract paths from a single .pth file
  • find_egg_link_files(site_packages_dir) - Find legacy .egg-link files
  • process_egg_link_file(egg_link_path) - Extract path from .egg-link file

Detection Logic:

def get_editable_install_paths():
    paths = set()

    # Get all site-packages directories
    site_dirs = get_site_packages_dirs()

    # Method 1: Scan for .egg-link files (legacy)
    for site_dir in site_dirs:
        for egg_link in find_egg_link_files(site_dir):
            path = process_egg_link_file(egg_link)
            if path:
                paths.add(path)

    # Method 2: Scan for __editable_*.pth files (PEP 660)
    for site_dir in site_dirs:
        for pth_file in find_pth_files(site_dir):
            if '__editable_' in pth_file:
                # Parse .pth file for paths
                extracted_paths = process_pth_file(pth_file)
                paths.update(extracted_paths)

    # Method 3: Regular .pth files with directory paths
    for site_dir in site_dirs:
        for pth_file in find_pth_files(site_dir):
            if '__editable_' not in pth_file:
                extracted_paths = process_pth_file(pth_file)
                paths.update(extracted_paths)

    return sorted(paths)

PTH File Processing:

.pth files can contain:

  • Directory paths (one per line, added to sys.path)
  • Import statements (executed but not added to sys.path)
  • Comments (lines starting with #, ignored)
def process_pth_file(pth_file_path):
    paths = []
    with open(pth_file_path, 'r') as f:
        for line in f:
            line = line.strip()

            # Skip empty lines and comments
            if not line or line.startswith('#'):
                continue

            # Skip import statements
            if line.startswith('import '):
                continue

            # Check if it's a valid directory path
            if Path(line).is_absolute():
                abs_path = Path(line)
            else:
                # Relative to .pth file directory
                abs_path = (Path(pth_file_path).parent / line).resolve()

            if abs_path.exists() and abs_path.is_dir():
                paths.append(str(abs_path))

    return paths

4. sys_path_init_patch.py

Purpose: Monkey-patches sys_path_init.patch_sys_path_with_developer_paths()

Features:

  • Hooks into Databricks' sys.path initialization
  • Auto-processes .pth files during sys.path updates
  • More elegant than manual processing

Key Classes:

  • SysPathInitPatch - Singleton class for managing the sys_path_init patch

Key Methods:

  • SysPathInitPatch().patch() - Apply the patch (returns PatchResult)
  • SysPathInitPatch().remove() - Remove the patch (returns bool)
  • SysPathInitPatch().is_applied() - Check if patched (returns bool)

Patching Strategy:

from dbx_patch.patches.sys_path_init_patch import SysPathInitPatch

class SysPathInitPatch(BasePatch):
    """Patch sys_path_init to auto-process .pth files."""

    def patch(self) -> PatchResult:
        try:
            import sys_path_init
        except ImportError:
            # Not in Databricks environment
            return PatchResult(success=False, function_found=False)

        # Store original function
        self._original_target = sys_path_init.patch_sys_path_with_developer_paths

        # Create patched function
        def patched_patch_sys_path_with_developer_paths():
            # First call original
            self._original_target()

            # Then process .pth files
            try:
                from dbx_patch.pth_processor import process_all_pth_files
                process_all_pth_files(force=False, verbose=False)
            except Exception:
                pass  # Fail silently

        # Apply patch
        sys_path_init.patch_sys_path_with_developer_paths = patched_patch_sys_path_with_developer_paths
        self._is_applied = True

        return PatchResult(success=True)

# Usage
patch = SysPathInitPatch()
result = patch.patch()

Why This Patch Matters:

Without this patch, .pth files are only processed once at startup. With this patch, they're processed automatically whenever sys.path is updated, ensuring editable paths persist.

5. wsfs_import_hook_patch.py

Purpose: Monkey-patches WsfsImportHook.__is_user_import()

Features:

  • Allows imports from editable install directories
  • Maintains cached list of allowed editable paths
  • Gracefully handles missing dbruntime module
  • Provides refresh mechanism for new installs

Key Classes:

  • WsfsImportHookPatch - Singleton class for managing the workspace import hook patch

Key Methods:

  • WsfsImportHookPatch().patch() - Apply import hook patch (returns PatchResult)
  • WsfsImportHookPatch().remove() - Remove patch (returns bool)
  • WsfsImportHookPatch().refresh_paths() - Refresh cached paths after new installs (returns int)
  • WsfsImportHookPatch().is_applied() - Check if patched (returns bool)
  • WsfsImportHookPatch().get_editable_paths() - Get current editable paths (returns set)

Patching Strategy:

from dbx_patch.patches.wsfs_import_hook_patch import WsfsImportHookPatch

class WsfsImportHookPatch(BasePatch):
    """Patch workspace import machinery to allow editable imports."""

    def patch(self) -> PatchResult:
    try:
        from dbruntime.wsfs_import_hook import WsfsImportHook
    except ImportError:
        # Not in Databricks environment
        return {'success': False, 'reason': 'not_in_databricks'}

    # Store original method
    original_is_user_import = WsfsImportHook._WsfsImportHook__is_user_import

    # Get editable paths
    editable_paths = detect_editable_paths()

    # Create patched method
    def patched_is_user_import(self):
        try:
            f = inspect.currentframe()
            num_items_processed = 0

            while f is not None:
                # Prevent infinite loops
                if num_items_processed >= self._WsfsImportHook__max_recursion_depth:
                    return True

                filename = self.get_filename(f)

                # Allow whitelisted paths (existing behavior)
                allow_import = any(
                    whitelisted in filename
                    for whitelisted in self.SITE_PACKAGE_WHITE_LIST
                )
                if allow_import:
                    return True

                # NEW: Allow imports from editable install paths
                if editable_paths:
                    is_editable = any(
                        filename.startswith(editable_path)
                        for editable_path in editable_paths
                    )
                    if is_editable:
                        return True

                # Check if from site-packages (existing behavior)
                is_site_packages = any(
                    filename.startswith(package)
                    for package in self._WsfsImportHook__site_packages
                )
                if is_site_packages:
                    return False

                num_items_processed += 1
                f = f.f_back

            # None of the stack frames are from site-packages, probably from user
            return True

        except Exception as e:
            # Fail open - allow the import if we can't determine
            return False

    # Apply patch
    WsfsImportHook._WsfsImportHook__is_user_import = patched_is_user_import

    return {'success': True, 'editable_paths': editable_paths}

5. python_path_hook_patch.py

Purpose: Monkey-patches PythonPathHook._handle_sys_path_maybe_updated()

Features:

  • Preserves editable paths when sys.path is modified
  • Prevents path loss during notebook/directory changes
  • Provides refresh mechanism for new installs

Key Classes:

  • PythonPathHookPatch - Singleton class for managing the PythonPathHook patch

Key Methods:

  • PythonPathHookPatch().patch() - Apply path hook patch (returns PatchResult)
  • PythonPathHookPatch().remove() - Remove patch (returns bool)
  • PythonPathHookPatch().refresh_paths() - Refresh cached paths (returns int)
  • PythonPathHookPatch().is_applied() - Check if patched (returns bool)
  • PythonPathHookPatch().get_editable_paths() - Get current editable paths (returns set)

Implementation:

from dbx_patch.patches.python_path_hook_patch import PythonPathHookPatch

class PythonPathHookPatch(BasePatch):
    """Patch PythonPathHook to preserve editable paths."""

    def patch(self) -> PatchResult:
    try:
        from dbruntime.pythonPathHook import PythonPathHook
    except ImportError:
        return {'success': False, 'reason': 'not_in_databricks'}

    # Store original method
    original_handle_update = PythonPathHook._handle_sys_path_maybe_updated

    # Get editable paths
    editable_paths = detect_editable_paths()

    # Create patched method
    def patched_handle_sys_path_maybe_updated(self):
        # Call original first
        original_handle_update(self)

        # Restore any missing editable paths
        if editable_paths:
            paths_to_restore = []
            for editable_path in editable_paths:
                if editable_path not in sys.path:
                    paths_to_restore.append(editable_path)

            # Restore missing paths (append to end)
            for path in paths_to_restore:
                sys.path.append(path)

    # Apply patch
    PythonPathHook._handle_sys_path_maybe_updated = patched_handle_sys_path_maybe_updated

    return {'success': True, 'editable_paths': editable_paths}

Why This Patch Matters:

Databricks' PythonPathHook modifies sys.path when changing notebooks or directories. Without this patch, editable paths are lost during these transitions.

6. autoreload_hook_patch.py

Purpose: Patches AutoreloadDiscoverabilityHook to allow imports from editable install paths

The Problem:

The AutoreloadDiscoverabilityHook wraps builtins.__import__ and maintains an allowlist of paths allowed for imports. By default, only /Workspace paths are allowed. When Python tries to import an editable package from a non-allowlisted path, the import is blocked even if the path is in sys.path.

Error Example:

File /databricks/python_shell/lib/dbruntime/autoreload/discoverability/hook.py:71
ModuleNotFoundError: No module named 'my_editable_package'

This occurs even after applying other patches because the autoreload hook intercepts the import at the builtins.__import__ level.

Databricks Code:

In /databricks/python_shell/lib/dbruntime/autoreload/file_module_utils.py:

_AUTORELOAD_ALLOWLIST_CHECKS: list[Callable[[str], bool]] = [
    lambda fname: fname.startswith("/Workspace"),
]

def register_autoreload_allowlist_check(check: Callable[[str], bool]) -> None:
    _AUTORELOAD_ALLOWLIST_CHECKS.append(check)

Features:

  • Registers an allowlist check for editable install paths
  • Dynamically detects editable paths using existing pth_processor
  • Gracefully handles non-Databricks environments
  • Works with autoreload's import interception

Key Functions:

  • patch_autoreload_hook(verbose=True) - Register allowlist check
  • unpatch_autoreload_hook(verbose=True) - Deregister check
  • is_patched() - Check if patched

Implementation:

def _editable_path_check(fname: str) -> bool:
    """Check if a file path is within an editable install directory."""
    if not fname:
        return False

    from dbx_patch.pth_processor import get_editable_install_paths
    editable_paths = get_editable_install_paths()

    # Check if the file is under any editable install path
    return any(fname.startswith(editable_path) for editable_path in editable_paths)

def patch_autoreload_hook(verbose=True):
    try:
        from dbruntime.autoreload.file_module_utils import (
            register_autoreload_allowlist_check,
        )
    except ImportError:
        return {'success': False, 'reason': 'not_in_databricks'}

    # Register our check function
    register_autoreload_allowlist_check(_editable_path_check)

    return {'success': True}

How It Works:

  1. Detects all editable install paths (from .pth files, .egg-link, etc.)
  2. Registers a custom allowlist check with Databricks' autoreload system
  3. When Python tries to import a module:
  4. Autoreload hook intercepts at builtins.__import__
  5. Gets the module's file path
  6. Runs all registered allowlist checks (including ours)
  7. If our check returns True (file is in an editable path), allows the import
  8. Otherwise, continues with standard checks

Why This Patch Matters:

Without this patch, the autoreload hook blocks imports from editable paths even when all other patches are applied. This is the final layer of import interception that needs to be addressed for complete editable install support.


Problem Analysis

Databricks runtime has 5 critical issues preventing editable installs:

Issue #1: Timing - Databricks loads before notebook code

The Core Problem: Databricks initializes its import system during Python startup, before any user code can run.

Normal notebook workflow (FAILS):

1. Python starts
2. Databricks sys_path_init runs
3. Databricks WsfsImportHook installs
4. Notebook cell 1 runs
5. User runs: patch_dbx()  ❌ TOO LATE!

Why it fails: The import system is already initialized with broken behavior. Patching after the fact doesn't help.

The Solution: Use sitecustomize.py to patch during Python initialization:

1. Python starts
2. sitecustomize.py runs
3. patch_dbx() ✅ Perfect timing!
4. Databricks sys_path_init runs (already patched)
5. Databricks WsfsImportHook installs (already patched)
6. Notebook runs (everything works)

Issue #2: Missing .pth file processing

Location: sys_path_init.py, app.py

Problem: Databricks bypasses standard Python site.py initialization, which means .pth files are never processed and paths are not added to sys.path.

Normal Python startup:

Python → site.py → process .pth files → add paths to sys.path

Databricks startup:

Python → sys_path_init.py → skip .pth processing → paths missing ❌

Our Fix:

  1. Patch sys_path_init.patch_sys_path_with_developer_paths() to auto-process .pth files
  2. Manually call process_all_pth_files() during patch application

Issue #3: WsfsImportHook blocks non-workspace paths

Location: wsfs_import_hook.py

Problem: Only allows imports from site-packages and /Workspace. Even if paths are in sys.path, imports are blocked.

Import flow:

import statement → WsfsImportHook.__is_user_import() → check path → block if not whitelisted ❌

Our Fix: Monkey-patch __is_user_import() to also check against editable install paths:

# Original: Only allows site-packages and /Workspace
# Patched: Also allows editable install paths

if filename.startswith(editable_path):  # NEW
    return True

Issue #4: Path management drops editable paths

Location: pythonPathHook.py

Problem: Doesn't preserve paths during sys.path updates. Editable paths are lost when changing notebooks/directories.

Update flow:

Notebook change → _handle_sys_path_maybe_updated() → rebuild sys.path → editable paths lost ❌

Our Fix: Monkey-patch _handle_sys_path_maybe_updated() to restore editable paths after each update:

def patched_handle_sys_path_maybe_updated(self):
    original_handle_update(self)  # Call original

    # Restore any missing editable paths
    for editable_path in editable_paths:
        if editable_path not in sys.path:
            sys.path.append(editable_path)  # NEW

Issue #5: AutoreloadDiscoverabilityHook blocks editable imports

Location: autoreload/discoverability/hook.py, autoreload/file_module_utils.py

Problem: Wraps builtins.__import__ with an allowlist check. Only allows imports from approved paths (default: /Workspace). Even if all other patches are applied, this hook can still block editable imports.

Import flow:

import statement → builtins.__import__ (wrapped) → check allowlist → block if not allowed ❌

Symptoms:

ModuleNotFoundError: No module named 'my_editable_package'
# Even though:
# - Package path is in sys.path
# - WsfsImportHook is patched
# - PythonPathHook is patched

Our Fix: Register an allowlist check for editable paths:

from dbruntime.autoreload.file_module_utils import register_autoreload_allowlist_check

def _editable_path_check(fname: str) -> bool:
    editable_paths = get_editable_install_paths()
    return any(fname.startswith(path) for path in editable_paths)

register_autoreload_allowlist_check(_editable_path_check)  # NEW

This is the final layer of import interception that must be addressed for complete editable install support.


The Timing Solution

Why sitecustomize.py is Critical

The ONLY way to fix editable installs in Databricks is to patch the import system before it's initialized. This is impossible from notebook code.

The Python Import System Initialization Order:

1. Python interpreter starts
2. Built-in modules loaded
3. site.py runs
4. sitecustomize.py runs ← WE PATCH HERE
5. Databricks sys_path_init.py runs ← NOW PATCHED
6. Databricks WsfsImportHook installs ← NOW PATCHED
7. Databricks PythonPathHook installs ← NOW PATCHED
8. IPython kernel starts
9. Notebook cells execute ← TOO LATE TO PATCH

sitecustomize.py Execution Guarantee

Python's import system guarantees that sitecustomize.py runs during interpreter initialization, after site.py but before any other imports.

From Python documentation:

After these path manipulations, an attempt is made to import a module named sitecustomize, which can perform arbitrary site-specific customizations.

This makes it the only reliable place to patch Databricks' initialization code.

Without sitecustomize.py - Why Manual Patching Fails

# Cell 1: Install and try to patch
%pip install dbx-patch
from dbx_patch import patch_dbx
patch_dbx()  # ❌ Patches applied

# Cell 2: Try to import
%pip install -e /Workspace/Repos/my-repo/my-package
import my_package  # ❌ FAILS - WsfsImportHook already initialized!

Why it fails:

  • WsfsImportHook was initialized in step 6 (before notebook runs)
  • Patching it in step 9 (during notebook) is too late
  • The hook's whitelist is already set and won't be updated

With sitecustomize.py - Automatic Success

# Cluster init script or notebook cell (run once)
pip install dbx-patch
python3 -c "from dbx_patch import install_sitecustomize; install_sitecustomize()"
# Python restarts automatically in Databricks
# After restart - any notebook, any time
%pip install -e /Workspace/Repos/my-repo/my-package
import my_package  # ✅ WORKS - hook was patched at startup!

Why it works:

  • sitecustomize.py ran in step 4 (before WsfsImportHook)
  • Patches applied before import system initialized
  • Everything works automatically from then on

Verification

Check if sitecustomize.py is working:

from dbx_patch import check_sitecustomize_status, verify_editable_installs

# Check installation
check_sitecustomize_status()

# Verify patches are active
verify_editable_installs()

API Reference

Main Functions

install_sitecustomize(verbose=True, force=False, restart_python=True)

  • Installs auto-patching mechanism
  • Creates sitecustomize.py in site-packages
  • Optionally restarts Python automatically (in Databricks)
  • force: Overwrite existing sitecustomize.py if present
  • restart_python: Auto-restart using dbutils (Databricks only)
  • Returns: bool - True if installed, False if already exists

uninstall_sitecustomize(verbose=True)

  • Removes auto-patching mechanism
  • Deletes sitecustomize.py
  • Requires Python restart to take effect
  • Returns: bool - True if uninstalled successfully

check_sitecustomize_status(verbose=True)

  • Verifies sitecustomize.py installation
  • Shows file path and contents
  • Indicates if patches are active
  • Returns: SitecustomizeStatus dataclass

patch_dbx(force_refresh=False)

  • Applies all patches manually
  • Processes .pth files
  • Patches import and path hooks
  • force_refresh: Re-detect editable paths
  • Returns: ApplyPatchesResult dataclass

verify_editable_installs(verbose=True)

  • Verifies configuration
  • Shows detected editable installs
  • Checks patch status
  • Tests import capabilities
  • Returns: VerifyResult dataclass

check_patch_status(verbose=True)

  • Checks if patches are applied
  • Shows configuration without modifying
  • Lists detected editable paths
  • Returns: StatusResult dataclass

remove_all_patches(verbose=True)

  • Removes all patches
  • Restores original behavior
  • Does NOT remove sitecustomize.py (use uninstall_sitecustomize for that)
  • Returns: RemovePatchesResult dataclass

Module-Specific Functions

pth_processor.py:

  • process_all_pth_files(force=False, verbose=True) - Process all .pth files and add paths to sys.path
  • get_editable_install_paths() - Get editable paths without modifying sys.path
  • get_site_packages_dirs() - Get all site-packages directories
  • find_pth_files(site_packages_dir) - Find .pth files in a directory
  • find_egg_link_files(site_packages_dir) - Find .egg-link files (legacy)
  • process_pth_file(pth_file_path) - Extract paths from a single .pth file
  • process_egg_link_file(egg_link_path) - Extract path from .egg-link file
  • add_paths_to_sys_path(paths, prepend=False) - Add paths to sys.path

sys_path_init_patch.py:

  • patch_sys_path_init(verbose=True) - Apply sys_path_init patch
  • unpatch_sys_path_init(verbose=True) - Remove patch
  • is_patched() - Check if patched

wsfs_import_hook_patch.py:

  • patch_wsfs_import_hook(verbose=True) - Apply import hook patch
  • unpatch_wsfs_import_hook(verbose=True) - Remove patch
  • refresh_editable_paths() - Refresh cached paths after new installs
  • is_patched() - Check if patched
  • detect_editable_paths() - Get current editable paths

python_path_hook_patch.py:

  • patch_python_path_hook(verbose=True) - Apply path hook patch
  • unpatch_python_path_hook(verbose=True) - Remove patch
  • refresh_editable_paths() - Refresh cached paths
  • is_patched() - Check if patched
  • detect_editable_paths() - Get current editable paths

autoreload_hook_patch.py:

  • patch_autoreload_hook(verbose=True) - Register allowlist check for editable paths
  • unpatch_autoreload_hook(verbose=True) - Deregister allowlist check
  • is_patched() - Check if patched

install_sitecustomize.py:

  • get_site_packages_path() - Get writable site-packages directory
  • get_sitecustomize_content() - Generate sitecustomize.py content

Usage Patterns

Best for: Production clusters, team environments

#!/bin/bash
# install-dbx-patch.sh

pip install dbx-patch
python3 -c "from dbx_patch import install_sitecustomize; install_sitecustomize(verbose=False)"

Benefits:

  • Automatic on every cluster start
  • No manual setup in notebooks
  • Works for all users on the cluster
  • Survives cluster restarts

Alternative: Manual Installation (Development)

Best for: Testing, development, one-off usage

# Run once per cluster (in any notebook)
%pip install dbx-patch
from dbx_patch import install_sitecustomize
install_sitecustomize()
# Python will restart automatically
# After restart - verify it's working
from dbx_patch import check_sitecustomize_status, verify_editable_installs
check_sitecustomize_status()
verify_editable_installs()

When: sitecustomize.py installation not possible

# Every time Python restarts
from dbx_patch import patch_dbx
patch_dbx()

# Install editable packages
%pip install -e /Workspace/Repos/my-repo/my-package

# Refresh patches if needed
from dbx_patch.patches.wsfs_import_hook_patch import refresh_editable_paths
refresh_editable_paths()

Limitations:

  • Must run in every notebook
  • Must run after every Python restart
  • Timing issues may still occur
  • Not suitable for production

Compatibility

Supported Environments

  • Python: 3.8, 3.9, 3.10, 3.11, 3.12
  • Databricks Runtime: 11.0+ (may work on older versions, untested)
  • Cluster Types: Standard, High Concurrency, Single Node
  • Package Managers: pip, conda (with pip for editable installs)

Supported Install Methods

  • ✅ Legacy setuptools editable installs (.egg-link)
  • ✅ PEP 660 editable installs (__editable_*.pth)
  • ✅ Standard .pth files with directory paths
  • ✅ Multiple editable packages simultaneously
  • ✅ Mixed editable and regular packages

Package Types

  • ✅ Pure Python packages
  • ✅ Packages with compiled extensions (C/C++/Rust)
  • ✅ Namespace packages (PEP 420)
  • ✅ Packages with entry points
  • ✅ Packages with package data
  • ✅ src-layout and flat-layout projects

Testing

The package includes comprehensive tests covering:

Unit Tests:

  • ✅ PTH file detection and processing (.pth, __editable_*.pth)
  • ✅ .egg-link file handling (legacy installs)
  • ✅ Site-packages directory detection
  • ✅ sys.path manipulation and validation
  • ✅ Patch application/removal/status checking
  • ✅ Error handling and edge cases
  • ✅ sitecustomize.py generation and installation

Integration Tests:

  • ✅ End-to-end editable install workflows
  • ✅ Multiple package installations
  • ✅ Patch persistence across operations
  • ✅ Databricks runtime interaction (when available)

Test Coverage:

  • Core functionality: 95%+
  • Edge cases: Comprehensive error handling
  • Mock testing for Databricks-specific components

Running Tests:

# Run all tests
pytest

# Run with coverage
pytest --cov=dbx_patch --cov-report=html

# Run specific test file
pytest tests/unit/test_pth_processor.py -v

Safety and Limitations

Safe Operation

  • ✅ Gracefully handles missing dbruntime (works in non-Databricks environments)
  • ✅ Non-destructive patching (can be cleanly removed)
  • ✅ Fail-open error handling (errors don't break Python startup)
  • ✅ No permanent modifications to Databricks code files
  • ✅ No interference with non-editable packages
  • ✅ Thread-safe operation (global state properly managed)

Known Limitations

1. Session-Specific Patches

  • Patches are applied per Python session
  • Cluster restart requires sitecustomize.py or manual reapplication
  • Solution: Use cluster init script for automatic installation

2. Timing Dependency

  • Manual patching (without sitecustomize.py) has timing limitations
  • Must patch before first import attempt
  • Solution: Use install_sitecustomize() for proper timing

3. Site-Packages Write Access

  • sitecustomize.py requires write access to site-packages
  • Some cluster configurations may restrict this
  • Solution: Use cluster init script with appropriate permissions

4. Editable Installs Only

  • Does not fix issues with non-editable installs
  • Regular packages work normally through standard mechanisms
  • Solution: Use editable installs for development packages

5. Path Caching

  • Editable paths are cached at patch time
  • New installs may require refresh_editable_paths()
  • Solution: Call refresh after installing new packages, or use force_refresh=True

Error Handling Strategy

The package follows a "fail-open" philosophy:

# In sitecustomize.py
try:
    from dbx_patch import patch_dbx
    patch_dbx(force_refresh=False)
except ImportError:
    pass  # dbx-patch not installed - skip
except Exception as e:
    print(f"Warning: {e}", file=sys.stderr)
    # Continue anyway - don't break Python startup

This ensures that:

  • Python startup is never broken
  • Missing dependencies are handled gracefully
  • Errors are logged but non-fatal
  • Normal operation continues even if patches fail

Best Practices

For Production:

  1. ✅ Use cluster init scripts for automatic installation
  2. ✅ Install sitecustomize.py for proper timing
  3. ✅ Test thoroughly with your specific packages
  4. ✅ Monitor for any edge cases or conflicts
  5. ✅ Keep dbx-patch updated to latest version

For Development:

  1. ✅ Install sitecustomize.py once per cluster
  2. ✅ Use verify_editable_installs() to check status
  3. ✅ Call refresh_editable_paths() after new installs
  4. ✅ Check patch status with check_patch_status()

For Troubleshooting:

  1. ✅ Run check_sitecustomize_status() to verify installation
  2. ✅ Run verify_editable_installs() to check patches
  3. ✅ Use verbose=True for detailed diagnostics
  4. ✅ Check sys.path to ensure editable paths are present
  5. ✅ Test imports individually to isolate issues

Performance Considerations

Startup Time Impact

  • sitecustomize.py adds ~50-200ms to Python startup
  • Patch application is one-time cost per session
  • No runtime overhead for imports after patching
  • .pth file processing scales with number of packages

Memory Usage

  • Minimal: ~10-50KB for cached editable paths
  • No significant memory overhead
  • Original methods stored for unpatch capability

Import Performance

  • No measurable impact on import speed
  • Patched code uses same algorithms as original
  • Additional path checks are O(n) where n = number of editable paths (typically < 10)

Contributing

If you encounter issues or have improvements:

  1. Check existing documentation:
  2. Review The Timing Problem section for timing-related issues
  3. Check tests for expected behavior patterns
  4. Review implementation for similar edge cases

  5. Report issues:

  6. Include Databricks runtime version
  7. Provide minimal reproducible example
  8. Include output from check_sitecustomize_status() and verify_editable_installs()
  9. Share sys.path and package installation details

  10. Contribute improvements:

  11. Add tests for new functionality
  12. Update documentation (this file, README, docstrings)
  13. Follow existing code style and patterns
  14. Ensure compatibility across Python versions
  15. Submit pull request with clear description

  16. Testing guidelines:

  17. Run full test suite: pytest
  18. Check coverage: pytest --cov=dbx_patch
  19. Test on multiple Python versions if possible
  20. Test both with and without Databricks runtime mocks

Troubleshooting Guide

Problem: Editable imports still fail after installation

Diagnosis:

from dbx_patch import check_sitecustomize_status, verify_editable_installs
check_sitecustomize_status()
verify_editable_installs()

Solutions:

  1. Ensure sitecustomize.py is installed: install_sitecustomize()
  2. Restart Python after installation
  3. Verify patches are applied: check_patch_status()
  4. Check if paths are in sys.path: import sys; print(sys.path)

Problem: sitecustomize.py not found after installation

Possible causes:

  • Multiple Python installations/environments
  • Permission issues with site-packages
  • Using different Python than expected

Solutions:

from dbx_patch.install_sitecustomize import get_site_packages_path
site_pkg = get_site_packages_path()
print(f"Installing to: {site_pkg}")

Problem: New editable install not recognized

Solution:

# Refresh cached paths
from dbx_patch.patches.wsfs_import_hook_patch import refresh_editable_paths
from dbx_patch.patches.python_path_hook_patch import refresh_editable_paths as refresh_path_hook
refresh_editable_paths()
refresh_path_hook()

Or use force_refresh:

from dbx_patch import patch_dbx
patch_dbx(force_refresh=True)

Problem: Patches removed after cluster restart

Expected behavior - patches are session-specific.

Solution: Use cluster init script:

#!/bin/bash
pip install dbx-patch
python3 -c "from dbx_patch import install_sitecustomize; install_sitecustomize(verbose=False)"

Further Reading