Technical Implementation
This document provides technical details about how DBX-Patch works internally.
The Timing Problem
Databricks loads sys_path_init and WsfsImportHook during Python startup, before any notebook code runs. This makes manual patching ineffective.
Without sitecustomize.py - FAILS:
1. Python starts
2. sys_path_init runs (skips .pth files)
3. WsfsImportHook installs (blocks editable imports)
4. Notebook runs
5. patch_dbx() ❌ Too late - import system already initialized!
With sitecustomize.py - WORKS:
1. Python starts
2. sitecustomize.py runs
3. patch_dbx() ✅ Perfect timing!
4. sys_path_init runs (already patched)
5. WsfsImportHook installs (already patched)
6. Notebook runs (everything works)
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ DBX-Patch Architecture │
└─────────────────────────────────────────────────────────────┘
Python Startup (with sitecustomize.py)
│
├─► sitecustomize.py
│ └─► patch_dbx()
│ ├─► Patch sys_path_init
│ ├─► Process .pth files
│ ├─► Patch WsfsImportHook
│ ├─► Patch PythonPathHook
│ └─► Patch AutoreloadDiscoverabilityHook
│
├─► sys_path_init (Databricks) - PATCHED
│ └─ Auto-processes .pth files on sys.path updates
│
├─► WsfsImportHook (Databricks) - PATCHED
│ └─ Allows imports from editable paths
│
├─► PythonPathHook (Databricks) - PATCHED
│ └─ Preserves editable paths during updates
│
└─► AutoreloadDiscoverabilityHook (Databricks) - PATCHED
└─ Allowlist includes editable paths
User Notebook
│
├─► %pip install -e /path/to/package
│
└─► import package ✅ Works!
Core Components
1. install_sitecustomize.py
Purpose: Installs auto-patching mechanism during Python startup
Why This Matters: Python's sitecustomize.py runs during interpreter initialization, BEFORE any Databricks code. This is the ONLY way to patch the import system early enough.
Key Functions:
install_sitecustomize(verbose=True, force=False, restart_python=True)- Creates sitecustomize.py in site-packagesuninstall_sitecustomize(verbose=True)- Removes auto-patchingcheck_sitecustomize_status(verbose=True)- Verifies installationget_site_packages_path()- Locates writable site-packages directoryget_sitecustomize_content()- Generates the sitecustomize.py file content
How it works:
def install_sitecustomize(verbose=True, force=False, restart_python=True):
# 1. Locate writable site-packages directory
site_packages = get_site_packages_path()
# 2. Check if already exists
sitecustomize_path = site_packages / "sitecustomize.py"
if sitecustomize_path.exists() and not force:
# Already installed
return False
# 3. Write auto-patch code
content = get_sitecustomize_content()
sitecustomize_path.write_text(content)
# 4. Restart Python to activate (in Databricks)
if restart_python:
try:
from dbutils import DBUtils
dbutils = DBUtils()
dbutils.library.restartPython()
except:
# Not in Databricks - manual restart needed
pass
return True
Generated sitecustomize.py:
"""Auto-apply dbx-patch on Python startup.
This file is automatically loaded by Python during interpreter initialization.
It applies all dbx-patch fixes BEFORE sys_path_init and import hooks are loaded.
"""
import sys
def _apply_dbx_patch():
"""Apply dbx-patch fixes silently during startup."""
try:
from dbx_patch import patch_dbx
patch_dbx(force_refresh=False)
except ImportError:
pass # dbx-patch not installed
except Exception as e:
print(f"Warning: dbx-patch auto-apply failed: {e}", file=sys.stderr)
# Apply patches immediately on import
_apply_dbx_patch()
2. patch_dbx.py
Purpose: Main entry point for applying all patches
Key Functions:
patch_dbx(force_refresh=False)- Applies all fixesverify_editable_installs(verbose=True)- Verificationcheck_patch_status(verbose=True)- Status checkingremove_all_patches(verbose=True)- Cleanup/removal
Workflow:
def patch_dbx(force_refresh=False):
from dbx_patch.patches.sys_path_init_patch import SysPathInitPatch
from dbx_patch.patches.wsfs_import_hook_patch import WsfsImportHookPatch
from dbx_patch.patches.python_path_hook_patch import PythonPathHookPatch
from dbx_patch.patches.autoreload_hook_patch import AutoreloadHookPatch
results = {}
# Step 1: Patch sys_path_init to auto-process .pth files
results['sys_path_init'] = SysPathInitPatch().patch()
# Step 2: Process .pth files immediately
results['pth_processing'] = process_all_pth_files(
force=force_refresh,
verbose=verbose
)
# Step 3: Patch WsfsImportHook to allow editable imports
results['wsfs_hook'] = WsfsImportHookPatch().patch()
# Step 4: Patch PythonPathHook to preserve paths
results['path_hook'] = PythonPathHookPatch().patch()
# Step 5: Patch AutoreloadDiscoverabilityHook to allow editable imports
results['autoreload_hook'] = AutoreloadHookPatch().patch()
return results
Why This Order Matters:
- sys_path_init first - Ensures future sys.path updates auto-process .pth files
- Process .pth files - Adds editable paths to sys.path immediately
- WsfsImportHook - Allows imports from those paths
- PythonPathHook - Prevents paths from being lost during notebook/directory changes
- AutoreloadDiscoverabilityHook - Allows imports at the builtins.import level
3. pth_processor.py
Purpose: Processes .pth files to extract editable install paths
Features:
- Detects
.egg-linkfiles (legacy setuptools) - Detects
__editable_*.pthfiles (PEP 660) - Processes standard
.pthfiles with directory paths - Adds paths to
sys.path
Key Functions:
process_all_pth_files(force=False, verbose=True)- Main processing functionget_editable_install_paths()- Get paths without modifying sys.pathget_site_packages_dirs()- Locate all site-packages directoriesfind_pth_files(site_packages_dir)- Find .pth files in a directoryprocess_pth_file(pth_file_path)- Extract paths from a single .pth filefind_egg_link_files(site_packages_dir)- Find legacy .egg-link filesprocess_egg_link_file(egg_link_path)- Extract path from .egg-link file
Detection Logic:
def get_editable_install_paths():
paths = set()
# Get all site-packages directories
site_dirs = get_site_packages_dirs()
# Method 1: Scan for .egg-link files (legacy)
for site_dir in site_dirs:
for egg_link in find_egg_link_files(site_dir):
path = process_egg_link_file(egg_link)
if path:
paths.add(path)
# Method 2: Scan for __editable_*.pth files (PEP 660)
for site_dir in site_dirs:
for pth_file in find_pth_files(site_dir):
if '__editable_' in pth_file:
# Parse .pth file for paths
extracted_paths = process_pth_file(pth_file)
paths.update(extracted_paths)
# Method 3: Regular .pth files with directory paths
for site_dir in site_dirs:
for pth_file in find_pth_files(site_dir):
if '__editable_' not in pth_file:
extracted_paths = process_pth_file(pth_file)
paths.update(extracted_paths)
return sorted(paths)
PTH File Processing:
.pth files can contain:
- Directory paths (one per line, added to sys.path)
- Import statements (executed but not added to sys.path)
- Comments (lines starting with #, ignored)
def process_pth_file(pth_file_path):
paths = []
with open(pth_file_path, 'r') as f:
for line in f:
line = line.strip()
# Skip empty lines and comments
if not line or line.startswith('#'):
continue
# Skip import statements
if line.startswith('import '):
continue
# Check if it's a valid directory path
if Path(line).is_absolute():
abs_path = Path(line)
else:
# Relative to .pth file directory
abs_path = (Path(pth_file_path).parent / line).resolve()
if abs_path.exists() and abs_path.is_dir():
paths.append(str(abs_path))
return paths
4. sys_path_init_patch.py
Purpose: Monkey-patches sys_path_init.patch_sys_path_with_developer_paths()
Features:
- Hooks into Databricks' sys.path initialization
- Auto-processes .pth files during sys.path updates
- More elegant than manual processing
Key Classes:
SysPathInitPatch- Singleton class for managing the sys_path_init patch
Key Methods:
SysPathInitPatch().patch()- Apply the patch (returns PatchResult)SysPathInitPatch().remove()- Remove the patch (returns bool)SysPathInitPatch().is_applied()- Check if patched (returns bool)
Patching Strategy:
from dbx_patch.patches.sys_path_init_patch import SysPathInitPatch
class SysPathInitPatch(BasePatch):
"""Patch sys_path_init to auto-process .pth files."""
def patch(self) -> PatchResult:
try:
import sys_path_init
except ImportError:
# Not in Databricks environment
return PatchResult(success=False, function_found=False)
# Store original function
self._original_target = sys_path_init.patch_sys_path_with_developer_paths
# Create patched function
def patched_patch_sys_path_with_developer_paths():
# First call original
self._original_target()
# Then process .pth files
try:
from dbx_patch.pth_processor import process_all_pth_files
process_all_pth_files(force=False, verbose=False)
except Exception:
pass # Fail silently
# Apply patch
sys_path_init.patch_sys_path_with_developer_paths = patched_patch_sys_path_with_developer_paths
self._is_applied = True
return PatchResult(success=True)
# Usage
patch = SysPathInitPatch()
result = patch.patch()
Why This Patch Matters:
Without this patch, .pth files are only processed once at startup. With this patch, they're processed automatically whenever sys.path is updated, ensuring editable paths persist.
5. wsfs_import_hook_patch.py
Purpose: Monkey-patches WsfsImportHook.__is_user_import()
Features:
- Allows imports from editable install directories
- Maintains cached list of allowed editable paths
- Gracefully handles missing dbruntime module
- Provides refresh mechanism for new installs
Key Classes:
WsfsImportHookPatch- Singleton class for managing the workspace import hook patch
Key Methods:
WsfsImportHookPatch().patch()- Apply import hook patch (returns PatchResult)WsfsImportHookPatch().remove()- Remove patch (returns bool)WsfsImportHookPatch().refresh_paths()- Refresh cached paths after new installs (returns int)WsfsImportHookPatch().is_applied()- Check if patched (returns bool)WsfsImportHookPatch().get_editable_paths()- Get current editable paths (returns set)
Patching Strategy:
from dbx_patch.patches.wsfs_import_hook_patch import WsfsImportHookPatch
class WsfsImportHookPatch(BasePatch):
"""Patch workspace import machinery to allow editable imports."""
def patch(self) -> PatchResult:
try:
from dbruntime.wsfs_import_hook import WsfsImportHook
except ImportError:
# Not in Databricks environment
return {'success': False, 'reason': 'not_in_databricks'}
# Store original method
original_is_user_import = WsfsImportHook._WsfsImportHook__is_user_import
# Get editable paths
editable_paths = detect_editable_paths()
# Create patched method
def patched_is_user_import(self):
try:
f = inspect.currentframe()
num_items_processed = 0
while f is not None:
# Prevent infinite loops
if num_items_processed >= self._WsfsImportHook__max_recursion_depth:
return True
filename = self.get_filename(f)
# Allow whitelisted paths (existing behavior)
allow_import = any(
whitelisted in filename
for whitelisted in self.SITE_PACKAGE_WHITE_LIST
)
if allow_import:
return True
# NEW: Allow imports from editable install paths
if editable_paths:
is_editable = any(
filename.startswith(editable_path)
for editable_path in editable_paths
)
if is_editable:
return True
# Check if from site-packages (existing behavior)
is_site_packages = any(
filename.startswith(package)
for package in self._WsfsImportHook__site_packages
)
if is_site_packages:
return False
num_items_processed += 1
f = f.f_back
# None of the stack frames are from site-packages, probably from user
return True
except Exception as e:
# Fail open - allow the import if we can't determine
return False
# Apply patch
WsfsImportHook._WsfsImportHook__is_user_import = patched_is_user_import
return {'success': True, 'editable_paths': editable_paths}
5. python_path_hook_patch.py
Purpose: Monkey-patches PythonPathHook._handle_sys_path_maybe_updated()
Features:
- Preserves editable paths when sys.path is modified
- Prevents path loss during notebook/directory changes
- Provides refresh mechanism for new installs
Key Classes:
PythonPathHookPatch- Singleton class for managing the PythonPathHook patch
Key Methods:
PythonPathHookPatch().patch()- Apply path hook patch (returns PatchResult)PythonPathHookPatch().remove()- Remove patch (returns bool)PythonPathHookPatch().refresh_paths()- Refresh cached paths (returns int)PythonPathHookPatch().is_applied()- Check if patched (returns bool)PythonPathHookPatch().get_editable_paths()- Get current editable paths (returns set)
Implementation:
from dbx_patch.patches.python_path_hook_patch import PythonPathHookPatch
class PythonPathHookPatch(BasePatch):
"""Patch PythonPathHook to preserve editable paths."""
def patch(self) -> PatchResult:
try:
from dbruntime.pythonPathHook import PythonPathHook
except ImportError:
return {'success': False, 'reason': 'not_in_databricks'}
# Store original method
original_handle_update = PythonPathHook._handle_sys_path_maybe_updated
# Get editable paths
editable_paths = detect_editable_paths()
# Create patched method
def patched_handle_sys_path_maybe_updated(self):
# Call original first
original_handle_update(self)
# Restore any missing editable paths
if editable_paths:
paths_to_restore = []
for editable_path in editable_paths:
if editable_path not in sys.path:
paths_to_restore.append(editable_path)
# Restore missing paths (append to end)
for path in paths_to_restore:
sys.path.append(path)
# Apply patch
PythonPathHook._handle_sys_path_maybe_updated = patched_handle_sys_path_maybe_updated
return {'success': True, 'editable_paths': editable_paths}
Why This Patch Matters:
Databricks' PythonPathHook modifies sys.path when changing notebooks or directories. Without this patch, editable paths are lost during these transitions.
6. autoreload_hook_patch.py
Purpose: Patches AutoreloadDiscoverabilityHook to allow imports from editable install paths
The Problem:
The AutoreloadDiscoverabilityHook wraps builtins.__import__ and maintains an allowlist of paths allowed for imports. By default, only /Workspace paths are allowed. When Python tries to import an editable package from a non-allowlisted path, the import is blocked even if the path is in sys.path.
Error Example:
File /databricks/python_shell/lib/dbruntime/autoreload/discoverability/hook.py:71
ModuleNotFoundError: No module named 'my_editable_package'
This occurs even after applying other patches because the autoreload hook intercepts the import at the builtins.__import__ level.
Databricks Code:
In /databricks/python_shell/lib/dbruntime/autoreload/file_module_utils.py:
_AUTORELOAD_ALLOWLIST_CHECKS: list[Callable[[str], bool]] = [
lambda fname: fname.startswith("/Workspace"),
]
def register_autoreload_allowlist_check(check: Callable[[str], bool]) -> None:
_AUTORELOAD_ALLOWLIST_CHECKS.append(check)
Features:
- Registers an allowlist check for editable install paths
- Dynamically detects editable paths using existing pth_processor
- Gracefully handles non-Databricks environments
- Works with autoreload's import interception
Key Functions:
patch_autoreload_hook(verbose=True)- Register allowlist checkunpatch_autoreload_hook(verbose=True)- Deregister checkis_patched()- Check if patched
Implementation:
def _editable_path_check(fname: str) -> bool:
"""Check if a file path is within an editable install directory."""
if not fname:
return False
from dbx_patch.pth_processor import get_editable_install_paths
editable_paths = get_editable_install_paths()
# Check if the file is under any editable install path
return any(fname.startswith(editable_path) for editable_path in editable_paths)
def patch_autoreload_hook(verbose=True):
try:
from dbruntime.autoreload.file_module_utils import (
register_autoreload_allowlist_check,
)
except ImportError:
return {'success': False, 'reason': 'not_in_databricks'}
# Register our check function
register_autoreload_allowlist_check(_editable_path_check)
return {'success': True}
How It Works:
- Detects all editable install paths (from .pth files, .egg-link, etc.)
- Registers a custom allowlist check with Databricks' autoreload system
- When Python tries to import a module:
- Autoreload hook intercepts at
builtins.__import__ - Gets the module's file path
- Runs all registered allowlist checks (including ours)
- If our check returns True (file is in an editable path), allows the import
- Otherwise, continues with standard checks
Why This Patch Matters:
Without this patch, the autoreload hook blocks imports from editable paths even when all other patches are applied. This is the final layer of import interception that needs to be addressed for complete editable install support.
Problem Analysis
Databricks runtime has 5 critical issues preventing editable installs:
Issue #1: Timing - Databricks loads before notebook code
The Core Problem: Databricks initializes its import system during Python startup, before any user code can run.
Normal notebook workflow (FAILS):
1. Python starts
2. Databricks sys_path_init runs
3. Databricks WsfsImportHook installs
4. Notebook cell 1 runs
5. User runs: patch_dbx() ❌ TOO LATE!
Why it fails: The import system is already initialized with broken behavior. Patching after the fact doesn't help.
The Solution: Use sitecustomize.py to patch during Python initialization:
1. Python starts
2. sitecustomize.py runs
3. patch_dbx() ✅ Perfect timing!
4. Databricks sys_path_init runs (already patched)
5. Databricks WsfsImportHook installs (already patched)
6. Notebook runs (everything works)
Issue #2: Missing .pth file processing
Location: sys_path_init.py, app.py
Problem: Databricks bypasses standard Python site.py initialization, which means .pth files are never processed and paths are not added to sys.path.
Normal Python startup:
Python → site.py → process .pth files → add paths to sys.path
Databricks startup:
Python → sys_path_init.py → skip .pth processing → paths missing ❌
Our Fix:
- Patch
sys_path_init.patch_sys_path_with_developer_paths()to auto-process .pth files - Manually call
process_all_pth_files()during patch application
Issue #3: WsfsImportHook blocks non-workspace paths
Location: wsfs_import_hook.py
Problem: Only allows imports from site-packages and /Workspace. Even if paths are in sys.path, imports are blocked.
Import flow:
import statement → WsfsImportHook.__is_user_import() → check path → block if not whitelisted ❌
Our Fix: Monkey-patch __is_user_import() to also check against editable install paths:
# Original: Only allows site-packages and /Workspace
# Patched: Also allows editable install paths
if filename.startswith(editable_path): # NEW
return True
Issue #4: Path management drops editable paths
Location: pythonPathHook.py
Problem: Doesn't preserve paths during sys.path updates. Editable paths are lost when changing notebooks/directories.
Update flow:
Notebook change → _handle_sys_path_maybe_updated() → rebuild sys.path → editable paths lost ❌
Our Fix: Monkey-patch _handle_sys_path_maybe_updated() to restore editable paths after each update:
def patched_handle_sys_path_maybe_updated(self):
original_handle_update(self) # Call original
# Restore any missing editable paths
for editable_path in editable_paths:
if editable_path not in sys.path:
sys.path.append(editable_path) # NEW
Issue #5: AutoreloadDiscoverabilityHook blocks editable imports
Location: autoreload/discoverability/hook.py, autoreload/file_module_utils.py
Problem: Wraps builtins.__import__ with an allowlist check. Only allows imports from approved paths (default: /Workspace). Even if all other patches are applied, this hook can still block editable imports.
Import flow:
import statement → builtins.__import__ (wrapped) → check allowlist → block if not allowed ❌
Symptoms:
ModuleNotFoundError: No module named 'my_editable_package'
# Even though:
# - Package path is in sys.path
# - WsfsImportHook is patched
# - PythonPathHook is patched
Our Fix: Register an allowlist check for editable paths:
from dbruntime.autoreload.file_module_utils import register_autoreload_allowlist_check
def _editable_path_check(fname: str) -> bool:
editable_paths = get_editable_install_paths()
return any(fname.startswith(path) for path in editable_paths)
register_autoreload_allowlist_check(_editable_path_check) # NEW
This is the final layer of import interception that must be addressed for complete editable install support.
The Timing Solution
Why sitecustomize.py is Critical
The ONLY way to fix editable installs in Databricks is to patch the import system before it's initialized. This is impossible from notebook code.
The Python Import System Initialization Order:
1. Python interpreter starts
2. Built-in modules loaded
3. site.py runs
4. sitecustomize.py runs ← WE PATCH HERE
5. Databricks sys_path_init.py runs ← NOW PATCHED
6. Databricks WsfsImportHook installs ← NOW PATCHED
7. Databricks PythonPathHook installs ← NOW PATCHED
8. IPython kernel starts
9. Notebook cells execute ← TOO LATE TO PATCH
sitecustomize.py Execution Guarantee
Python's import system guarantees that sitecustomize.py runs during interpreter initialization, after site.py but before any other imports.
From Python documentation:
After these path manipulations, an attempt is made to import a module named
sitecustomize, which can perform arbitrary site-specific customizations.
This makes it the only reliable place to patch Databricks' initialization code.
Without sitecustomize.py - Why Manual Patching Fails
# Cell 1: Install and try to patch
%pip install dbx-patch
from dbx_patch import patch_dbx
patch_dbx() # ❌ Patches applied
# Cell 2: Try to import
%pip install -e /Workspace/Repos/my-repo/my-package
import my_package # ❌ FAILS - WsfsImportHook already initialized!
Why it fails:
- WsfsImportHook was initialized in step 6 (before notebook runs)
- Patching it in step 9 (during notebook) is too late
- The hook's whitelist is already set and won't be updated
With sitecustomize.py - Automatic Success
# Cluster init script or notebook cell (run once)
pip install dbx-patch
python3 -c "from dbx_patch import install_sitecustomize; install_sitecustomize()"
# Python restarts automatically in Databricks
# After restart - any notebook, any time
%pip install -e /Workspace/Repos/my-repo/my-package
import my_package # ✅ WORKS - hook was patched at startup!
Why it works:
- sitecustomize.py ran in step 4 (before WsfsImportHook)
- Patches applied before import system initialized
- Everything works automatically from then on
Verification
Check if sitecustomize.py is working:
from dbx_patch import check_sitecustomize_status, verify_editable_installs
# Check installation
check_sitecustomize_status()
# Verify patches are active
verify_editable_installs()
API Reference
Main Functions
install_sitecustomize(verbose=True, force=False, restart_python=True)
- Installs auto-patching mechanism
- Creates sitecustomize.py in site-packages
- Optionally restarts Python automatically (in Databricks)
force: Overwrite existing sitecustomize.py if presentrestart_python: Auto-restart using dbutils (Databricks only)- Returns:
bool- True if installed, False if already exists
uninstall_sitecustomize(verbose=True)
- Removes auto-patching mechanism
- Deletes sitecustomize.py
- Requires Python restart to take effect
- Returns:
bool- True if uninstalled successfully
check_sitecustomize_status(verbose=True)
- Verifies sitecustomize.py installation
- Shows file path and contents
- Indicates if patches are active
- Returns:
SitecustomizeStatusdataclass
patch_dbx(force_refresh=False)
- Applies all patches manually
- Processes .pth files
- Patches import and path hooks
force_refresh: Re-detect editable paths- Returns:
ApplyPatchesResultdataclass
verify_editable_installs(verbose=True)
- Verifies configuration
- Shows detected editable installs
- Checks patch status
- Tests import capabilities
- Returns:
VerifyResultdataclass
check_patch_status(verbose=True)
- Checks if patches are applied
- Shows configuration without modifying
- Lists detected editable paths
- Returns:
StatusResultdataclass
remove_all_patches(verbose=True)
- Removes all patches
- Restores original behavior
- Does NOT remove sitecustomize.py (use uninstall_sitecustomize for that)
- Returns:
RemovePatchesResultdataclass
Module-Specific Functions
pth_processor.py:
process_all_pth_files(force=False, verbose=True)- Process all .pth files and add paths to sys.pathget_editable_install_paths()- Get editable paths without modifying sys.pathget_site_packages_dirs()- Get all site-packages directoriesfind_pth_files(site_packages_dir)- Find .pth files in a directoryfind_egg_link_files(site_packages_dir)- Find .egg-link files (legacy)process_pth_file(pth_file_path)- Extract paths from a single .pth fileprocess_egg_link_file(egg_link_path)- Extract path from .egg-link fileadd_paths_to_sys_path(paths, prepend=False)- Add paths to sys.path
sys_path_init_patch.py:
patch_sys_path_init(verbose=True)- Apply sys_path_init patchunpatch_sys_path_init(verbose=True)- Remove patchis_patched()- Check if patched
wsfs_import_hook_patch.py:
patch_wsfs_import_hook(verbose=True)- Apply import hook patchunpatch_wsfs_import_hook(verbose=True)- Remove patchrefresh_editable_paths()- Refresh cached paths after new installsis_patched()- Check if patcheddetect_editable_paths()- Get current editable paths
python_path_hook_patch.py:
patch_python_path_hook(verbose=True)- Apply path hook patchunpatch_python_path_hook(verbose=True)- Remove patchrefresh_editable_paths()- Refresh cached pathsis_patched()- Check if patcheddetect_editable_paths()- Get current editable paths
autoreload_hook_patch.py:
patch_autoreload_hook(verbose=True)- Register allowlist check for editable pathsunpatch_autoreload_hook(verbose=True)- Deregister allowlist checkis_patched()- Check if patched
install_sitecustomize.py:
get_site_packages_path()- Get writable site-packages directoryget_sitecustomize_content()- Generate sitecustomize.py content
Usage Patterns
Recommended: Cluster Init Script (Production)
Best for: Production clusters, team environments
#!/bin/bash
# install-dbx-patch.sh
pip install dbx-patch
python3 -c "from dbx_patch import install_sitecustomize; install_sitecustomize(verbose=False)"
Benefits:
- Automatic on every cluster start
- No manual setup in notebooks
- Works for all users on the cluster
- Survives cluster restarts
Alternative: Manual Installation (Development)
Best for: Testing, development, one-off usage
# Run once per cluster (in any notebook)
%pip install dbx-patch
from dbx_patch import install_sitecustomize
install_sitecustomize()
# Python will restart automatically
# After restart - verify it's working
from dbx_patch import check_sitecustomize_status, verify_editable_installs
check_sitecustomize_status()
verify_editable_installs()
Edge Case: Manual Patching (Not Recommended)
When: sitecustomize.py installation not possible
# Every time Python restarts
from dbx_patch import patch_dbx
patch_dbx()
# Install editable packages
%pip install -e /Workspace/Repos/my-repo/my-package
# Refresh patches if needed
from dbx_patch.patches.wsfs_import_hook_patch import refresh_editable_paths
refresh_editable_paths()
Limitations:
- Must run in every notebook
- Must run after every Python restart
- Timing issues may still occur
- Not suitable for production
Compatibility
Supported Environments
- Python: 3.8, 3.9, 3.10, 3.11, 3.12
- Databricks Runtime: 11.0+ (may work on older versions, untested)
- Cluster Types: Standard, High Concurrency, Single Node
- Package Managers: pip, conda (with pip for editable installs)
Supported Install Methods
- ✅ Legacy setuptools editable installs (
.egg-link) - ✅ PEP 660 editable installs (
__editable_*.pth) - ✅ Standard .pth files with directory paths
- ✅ Multiple editable packages simultaneously
- ✅ Mixed editable and regular packages
Package Types
- ✅ Pure Python packages
- ✅ Packages with compiled extensions (C/C++/Rust)
- ✅ Namespace packages (PEP 420)
- ✅ Packages with entry points
- ✅ Packages with package data
- ✅ src-layout and flat-layout projects
Testing
The package includes comprehensive tests covering:
Unit Tests:
- ✅ PTH file detection and processing (
.pth,__editable_*.pth) - ✅ .egg-link file handling (legacy installs)
- ✅ Site-packages directory detection
- ✅ sys.path manipulation and validation
- ✅ Patch application/removal/status checking
- ✅ Error handling and edge cases
- ✅ sitecustomize.py generation and installation
Integration Tests:
- ✅ End-to-end editable install workflows
- ✅ Multiple package installations
- ✅ Patch persistence across operations
- ✅ Databricks runtime interaction (when available)
Test Coverage:
- Core functionality: 95%+
- Edge cases: Comprehensive error handling
- Mock testing for Databricks-specific components
Running Tests:
# Run all tests
pytest
# Run with coverage
pytest --cov=dbx_patch --cov-report=html
# Run specific test file
pytest tests/unit/test_pth_processor.py -v
Safety and Limitations
Safe Operation
- ✅ Gracefully handles missing dbruntime (works in non-Databricks environments)
- ✅ Non-destructive patching (can be cleanly removed)
- ✅ Fail-open error handling (errors don't break Python startup)
- ✅ No permanent modifications to Databricks code files
- ✅ No interference with non-editable packages
- ✅ Thread-safe operation (global state properly managed)
Known Limitations
1. Session-Specific Patches
- Patches are applied per Python session
- Cluster restart requires sitecustomize.py or manual reapplication
- Solution: Use cluster init script for automatic installation
2. Timing Dependency
- Manual patching (without sitecustomize.py) has timing limitations
- Must patch before first import attempt
- Solution: Use install_sitecustomize() for proper timing
3. Site-Packages Write Access
- sitecustomize.py requires write access to site-packages
- Some cluster configurations may restrict this
- Solution: Use cluster init script with appropriate permissions
4. Editable Installs Only
- Does not fix issues with non-editable installs
- Regular packages work normally through standard mechanisms
- Solution: Use editable installs for development packages
5. Path Caching
- Editable paths are cached at patch time
- New installs may require refresh_editable_paths()
- Solution: Call refresh after installing new packages, or use force_refresh=True
Error Handling Strategy
The package follows a "fail-open" philosophy:
# In sitecustomize.py
try:
from dbx_patch import patch_dbx
patch_dbx(force_refresh=False)
except ImportError:
pass # dbx-patch not installed - skip
except Exception as e:
print(f"Warning: {e}", file=sys.stderr)
# Continue anyway - don't break Python startup
This ensures that:
- Python startup is never broken
- Missing dependencies are handled gracefully
- Errors are logged but non-fatal
- Normal operation continues even if patches fail
Best Practices
For Production:
- ✅ Use cluster init scripts for automatic installation
- ✅ Install sitecustomize.py for proper timing
- ✅ Test thoroughly with your specific packages
- ✅ Monitor for any edge cases or conflicts
- ✅ Keep dbx-patch updated to latest version
For Development:
- ✅ Install sitecustomize.py once per cluster
- ✅ Use verify_editable_installs() to check status
- ✅ Call refresh_editable_paths() after new installs
- ✅ Check patch status with check_patch_status()
For Troubleshooting:
- ✅ Run check_sitecustomize_status() to verify installation
- ✅ Run verify_editable_installs() to check patches
- ✅ Use verbose=True for detailed diagnostics
- ✅ Check sys.path to ensure editable paths are present
- ✅ Test imports individually to isolate issues
Performance Considerations
Startup Time Impact
- sitecustomize.py adds ~50-200ms to Python startup
- Patch application is one-time cost per session
- No runtime overhead for imports after patching
- .pth file processing scales with number of packages
Memory Usage
- Minimal: ~10-50KB for cached editable paths
- No significant memory overhead
- Original methods stored for unpatch capability
Import Performance
- No measurable impact on import speed
- Patched code uses same algorithms as original
- Additional path checks are O(n) where n = number of editable paths (typically < 10)
Contributing
If you encounter issues or have improvements:
- Check existing documentation:
- Review The Timing Problem section for timing-related issues
- Check tests for expected behavior patterns
-
Review implementation for similar edge cases
-
Report issues:
- Include Databricks runtime version
- Provide minimal reproducible example
- Include output from check_sitecustomize_status() and verify_editable_installs()
-
Share sys.path and package installation details
-
Contribute improvements:
- Add tests for new functionality
- Update documentation (this file, README, docstrings)
- Follow existing code style and patterns
- Ensure compatibility across Python versions
-
Submit pull request with clear description
-
Testing guidelines:
- Run full test suite:
pytest - Check coverage:
pytest --cov=dbx_patch - Test on multiple Python versions if possible
- Test both with and without Databricks runtime mocks
Troubleshooting Guide
Problem: Editable imports still fail after installation
Diagnosis:
from dbx_patch import check_sitecustomize_status, verify_editable_installs
check_sitecustomize_status()
verify_editable_installs()
Solutions:
- Ensure sitecustomize.py is installed:
install_sitecustomize() - Restart Python after installation
- Verify patches are applied:
check_patch_status() - Check if paths are in sys.path:
import sys; print(sys.path)
Problem: sitecustomize.py not found after installation
Possible causes:
- Multiple Python installations/environments
- Permission issues with site-packages
- Using different Python than expected
Solutions:
from dbx_patch.install_sitecustomize import get_site_packages_path
site_pkg = get_site_packages_path()
print(f"Installing to: {site_pkg}")
Problem: New editable install not recognized
Solution:
# Refresh cached paths
from dbx_patch.patches.wsfs_import_hook_patch import refresh_editable_paths
from dbx_patch.patches.python_path_hook_patch import refresh_editable_paths as refresh_path_hook
refresh_editable_paths()
refresh_path_hook()
Or use force_refresh:
from dbx_patch import patch_dbx
patch_dbx(force_refresh=True)
Problem: Patches removed after cluster restart
Expected behavior - patches are session-specific.
Solution: Use cluster init script:
#!/bin/bash
pip install dbx-patch
python3 -c "from dbx_patch import install_sitecustomize; install_sitecustomize(verbose=False)"