Ξ

Python packaging with pyproject.toml and setuptools

Published on 2023-11-04 code python

Python packaging has been in a bad state for ages. I recently read a post by Gregory Szorc that resonated with me a lot. Still, I do not personally have any issues in practice. So in this post I am going to explain how I do package management without loosing my mind.

Be aware of the different kinds of tools in package management

Package management in python is highly modular, and each part of the process can have multiple implementations. In this article I will use setuptools as a build backend (the part that actually builds the package), build as a build frontend (the part that creates a build environment) and pip to install packages. But I could also use poetry to cover all of those roles with a single tool.

There are standards that allow all of these tools to work together. PEP 427 defines the "wheel" package format. PEP 517 defines the interface between build backends and frontends.

Note that the separation is not always razor sharp. For example, a build frontend may also have to install packages into the build environment. And an installer might have to act as a build frontend if the package is not available as a wheel.

For an excellent overview of the different tools and options for package management, see this post by Anna-Lena Popkes.

Don't create a package if you want an environment

Many modern package managers like npm, cargo, or Poetry automatically create lockfiles and recommend to commit them to version control. I really don't understand why they are doing this. A package is supposed to be installed along with other packages, so it needs to be compatible with as many versions as possible. I do understand that you sometimes want a reproducible environment. But those are two separate things.

If you want to create a reproducible environment, you can use a simple requirements.txt file and install it with python -m pip install -r requirements.txt. The file could look like this:

# allow a range of versions
foo >= 1.1, < 2.0

# select optional features
bar[feature]

# pin a specific version and specifiy a hash for supply chain integrity
baz == 1.2.3 --hash=sha256:f22fa1e554c9ddfd16e6e41ac79759e17be9e492b3587efa038054674760e72d

There are some tools that can help you generate these files, e.g. pip freeze or pip-tools. Still, I find that you should not have too much automation in this area. The goal is that you have control over the environment, not the other way around.

Use venv instead of virtualenv

You usually want to install the requirements for each project into a separate environment. That approach was pioneered by the package virtualenv. However, the functionality was so useful that it was integrated into the standard library in python 3.3 (2012). I still see references to virtualenv more than 10 years later, but you really don't need it. Just run python -m venv instead.

Use pyproject.toml to specify package meta data

I stuck with setup.py and setup.cfg pretty long. The most important reason was that editable installs were not supported when using pyproject.toml. Fortunately, this is not a problem because setuptools can still read meta data from those files.

But the future is pyproject.toml and both setuptools (>= 64) and pip (>= 21.3) now support editable installs. Note that Ubuntu 22.04 is still on setuptools 59. I have started porting some projects to the new system. But I will probably wait with some more critical projects until the new features are widely available.

The setuptools documentation on pyproject.toml is solid and porting an existing setup.py or setup.cfg to the new syntax should be simple enough. You can also use ini2toml to automatically do the conversion.

Once you have created that file you can build your package either by using python -m build (which I see recommended in most places) or python -m pip wheel . (which doesn't require an additional tool). To upload your package to PyPI you can use twine.

Include data files

setuptools will automatically include python files in the package. If you need to include other files, e.g. templates or translations, you traditionally had to use a separate MANIFEST.in file. That still works, but it can also be included in pyproject.toml directly:

[tool.setuptools.package-data]
mypackage = [
    "**/*.html",
    "**/*.csv",
]

Use backend-specific configuration for more complex packages

So far we discussed pure python packages. Packages that contain C code or similar are much more complicated for several reasons. First because they need an additional compile step, and second because we need to build different binary packages for different architectures.

For setuptools you still configure that in setup.py. (setup.py is not deprecated. It is just no longer necessary for simple packages.) However, there are other, more specialized build backends like scikit-build-core or meson-python.

Configure other tools

Most tools can be configured using pyproject.toml, e.g. pytest, coverage, or isort. A prominent exception is flake8, but you can replace most of it by ruff.

Conclusion

Python's packaging infrastructure is certainly not it's best feature, but it is still usable. The transition to pyproject.toml took far too long and was far too messy, but I am confident that we will finally be done with it in just a few years.

In this article I stuck with setuptools, because that is the build backend I know best. However, setuptools has accumulated a lot of legacy code over the years. flit is another backend that has "not being setuptools" as its main feature.

I wouldn't say that python packaging is good now. But at least it has stabilized to a degree that I feel like we could actually, finally reap the benefits of blowing up everything.