RC: W7 D3 — `ModuleNotFoundError` shenanigans
March 27, 2024After implementing the bloom filter yesterday, Laurent thought about running a small script to
compute a bunch of statistics and build some intuition about the behavior of bloom filters.
So he wrote a script that imports the file containing our implementation, we put it in a scripts
folder and
executed it.
And… instead of getting any statistics, we got the (common!) ModuleNotFoundError
😅.
Usually, when I get this error, I just add an __init__.py
file in the module I am importing (src
in this case) to
tell Python “Hey! This is a module” and I don’t think anymore about it.
But this time it did not work and, instead of trying to find some workarounds, we decided we would really understand
what Python does.
So we created a minimal repository setup with the following structure:
my_project/
├───src/
│ ├───__init__.py
│ └───a.py
└───scripts/
└───run.py
The src/a.py
file contains:
print("in file", __file__)
And the scripts/run.py
file contains:
import sys
print("sys.path", sys.path)
import src.a
print("Finished executing the script!")
With this, when we execute python scripts/run.py
, we get a ModuleNotFoundError
.
By printing the content of sys.path
, we get to see that there are a bunch of paths corresponding to the python
installation and, more interestingly, one corresponding to the scripts
folder ('/Users/maud/my_project/scripts'
).
That’s interesting because that is exactly why the src
module cannot be found: it is located
in '/Users/maud/my_project/
and this path has not been added to sys.path
.
What happens precisely is that Python adds the path to the directory containing the file that is being executed
in sys.path
(as mentioned in the docs).
So moving the script to the root would solve the problem, but no one really wants to do this because files get
completely disorganized.
There are several workarounds to avoid this:
- Run the script as a module by executing
python -m scripts.run
. As mentioned in the docs, the current directory gets added tosys.path
. That works, but is a bit cumbersome because there is no autocompletion in the terminal forscripts.run
by default. - Manually append the path to the root directory at the beginning of each script file
sys.path.append(/path/to/root/)
. This also works, but it is very tedious to add this in every single script file. - Add the current directory to
sys.path
from the CLI before executing the script (export PYTHONPATH=: ; python scripts/run.py
). This can be automated to avoid having to add it every time on one’s particular machine, but when sharing the repo and giving instructions to execute a script, users will get into the same trouble.
Laurent figured that Python uses “path configuration files” that are located in the site-packages
folder and having
a .pth
extension.
As mentioned in the docs, the content of those files is added
to sys.path
.
So if we add the path to the root directory in it, it’s all solved.
We can do this easily by adding a small script that does this:
import os, site, pathlib
# Define path to pth file
pth_filename = f"{pathlib.Path(__file__).stem}.pth"
site_packages_directory = site.getsitepackages()[0]
pth_file_path = os.path.join(site_packages_directory, pth_filename)
# Add root directory to pth file
root_directory = os.path.abspath(os.path.dirname(__file__))
with open(pth_file_path, 'w') as f:
f.write(root_directory + '\n')
print(f"Added '{root_directory}' to '{pth_file_path}'.")
After executing this setup script, the .pth
is configured correctly, and the trouble with importing modules is gone
forever.
I created a small repo with the correct minimal setup here.