RC: W7 D3 — `ModuleNotFoundError` shenanigans

March 27, 2024

After implementing the bloom filter yesterday, Laurent thought about running a small script to compute a bunch of statistics and build some intuition about the behavior of bloom filters. So he wrote a script that imports the file containing our implementation, we put it in a scripts folder and executed it. And… instead of getting any statistics, we got the (common!) ModuleNotFoundError 😅.

Usually, when I get this error, I just add an __init__.py file in the module I am importing (src in this case) to tell Python “Hey! This is a module” and I don’t think anymore about it. But this time it did not work and, instead of trying to find some workarounds, we decided we would really understand what Python does.

So we created a minimal repository setup with the following structure:

my_project/
├───src/
│   ├───__init__.py
│   └───a.py
└───scripts/
    └───run.py

The src/a.py file contains:

print("in file", __file__)

And the scripts/run.py file contains:

import sys

print("sys.path", sys.path)

import src.a

print("Finished executing the script!")

With this, when we execute python scripts/run.py, we get a ModuleNotFoundError. By printing the content of sys.path, we get to see that there are a bunch of paths corresponding to the python installation and, more interestingly, one corresponding to the scripts folder ('/Users/maud/my_project/scripts'). That’s interesting because that is exactly why the src module cannot be found: it is located in '/Users/maud/my_project/ and this path has not been added to sys.path. What happens precisely is that Python adds the path to the directory containing the file that is being executed in sys.path (as mentioned in the docs). So moving the script to the root would solve the problem, but no one really wants to do this because files get completely disorganized.

There are several workarounds to avoid this:

Run the script as a module by executing python -m scripts.run. As mentioned in the docs, the current directory gets added to sys.path. That works, but is a bit cumbersome because there is no autocompletion in the terminal for scripts.run by default.
Manually append the path to the root directory at the beginning of each script file sys.path.append(/path/to/root/). This also works, but it is very tedious to add this in every single script file.
Add the current directory to sys.path from the CLI before executing the script (export PYTHONPATH=: ; python scripts/run.py). This can be automated to avoid having to add it every time on one’s particular machine, but when sharing the repo and giving instructions to execute a script, users will get into the same trouble.

Laurent figured that Python uses “path configuration files” that are located in the site-packages folder and having a .pth extension. As mentioned in the docs, the content of those files is added to sys.path. So if we add the path to the root directory in it, it’s all solved.

We can do this easily by adding a small script that does this:

import os, site, pathlib

# Define path to pth file
pth_filename = f"{pathlib.Path(__file__).stem}.pth"
site_packages_directory = site.getsitepackages()[0]
pth_file_path = os.path.join(site_packages_directory, pth_filename)

# Add root directory to pth file
root_directory = os.path.abspath(os.path.dirname(__file__))
with open(pth_file_path, 'w') as f:
    f.write(root_directory + '\n')

print(f"Added '{root_directory}' to '{pth_file_path}'.")

After executing this setup script, the .pth is configured correctly, and the trouble with importing modules is gone forever.

I created a small repo with the correct minimal setup here.