1. Omnibus as a Solution for Python Dependency Hell

Omnibus as a Solution for Python Dependency Hell

Python has always been an exemplar of good coding style for me. It’s impressive how concise unit tests can be, how flexible mocks are. Really, it’s a great pleasure to write Python code. Up until the point when you need to deploy it. From then on you embark on a rollercoaster to go through a variety of tools and approaches. This article is supposed to help you sail over Python dependency hell, stay alive and still enjoy writing in this awesome language.

Yet Another Packaging Tool?

There is a plethora of tools according to the official packaging guide. How will yet another one make life easier? When an application is more complex than a one file script, things tend to be grouped in modules and then further in packages. The packages can be uploaded to PyPI and from there either reused in other applications or installed and used on servers. So far sounds straightforward, but it turns out Python version that comes with most distributions is too old. Python 2.7 is still the default on CentOS 7 (well, it will always be). Good news, the latest Ubuntu distributions ship Python 3 for great joy of Python 3 apologists. But even if we settle on Python version, a bigger problem comes up. It turns out packages A and B may depend on non-intersecting version sets of a package C – a classic dependency hell problem. A classic because it’s been known for a while and not specific to Python. What to do in that case? There were attempts to compile dependencies in, or use virtual environments. However those solutions either incomplete (you still need to distribute a compiled binary – how?) or operational nightmare (PATH,PYTHONPATH, etc).

More or less a well accepted solution is to put Python, dependencies and the app into a userland and run it in a container. It gives freedom, flexibility, repeated environments – one of the reason why Docker and Kubernetes are so popular.

There are still use cases when containers aren’t that convenient or even possible. Stateful services, like databases are not very container friendly. Usually they are run either on bare metal or on compute resources if in cloud. That would be a use case number one.

Another example is unknown environments. We write TwinDB Backup in Python and who knows where it runs. We must distribute it in something everyone knows, everyone understands and familiar with. Datadog, Sensu also have that problem – their agents must run in unknown environments, must smoothly install and not break anything. In fact, it’s Datagod who inspired the way TwinDB Backup is distributed today.

How Omnibus Approaches Problem

So I pitched a problem and accordingly to the article title Omnibus is supposed to offer a solution. It does.

Omnibus creates a package (.rpm or .deb depending what OS you build it on) with no dependencies. It packages the app, its dependencies, even Python itself into a single .rpm. It’s very similar to virtual environments except it relieves you from burden of paths and environment variables. You control what Python version it packages, as well as dependencies. When installing an .rpm built by Omnibus it doesn’t conflict with anything as well as it doesn’t break anything on the existing system.

Behind the scenes Omnibus runs pip to install the application and package it into the rpm. Check out a build script:

How to Deal with Dependencies

Now, when we figured Omnibus builds a kind of virtual environment (in Python sense of the word), it also means you work with dependencies as you work in one. Meaning if you need to pin a version – you can do it. If you need a specific dependency version – you can do that, too! Whether you pin version or not depends on whether your code is going to be reused. So, if the application is a command line tool that will never be imported in another Python application – you should pin the versions. If the code is a library then the versions should not be pinned, but should define a range of tested versions. On that – further down.

Meanwhile to pin dependencies version pip-tools is a tool of choice. Basically, you define whatever your app depends on in requirements.in.

Pip-tools in its turn generates requirements.txt:

When I need to generate requirements.txt, the pip-compile tool does the job.

Packaging reusable code

Obviously, you want to reuse the code. If I were to write MySQL in Python I would have MySQL server as an Omnibus package, MySQL client as the Omnibus package and both of them depended on a Python package MySQL library. We can do that, right? Because Omnibus installs any required dependency in the rpm virtual environment.
Again, we package two things:

  • Omnibus packages for command line tools exposed to OS and users
  • Python packages to be reused.

Specifying Dependencies Versions

Now, how to you specify dependencies. Lately I’m convinced the maximum and minimum versions should be capped.

The minimal version is needed because who knows how the app works on outdated versions. If 6.5 is the today’s version of Click – use it as the minimal version and let others upgrade.
The upper limit is needed, too. If you are familiar with Semantic Versioning, you know the major version increments means backwards incompatible change. In other words, whatever depends on Click today will break when Click 7 is released. Actually, Click 7 is released and any sub-command with an underscore sub_command became sub-command by default. You don’t want sudden breaks, do you? If so, cap the upper version Click>=6.5,<7.0.

If also gives you an opportunity to freely develop new backward incompatible features as well as gradually migrate all dependent projects to the new version. Say, if you released a package version 2.0 with the incompatible change, the packages having <2.0 in requirements.txt will build just fine.

Hosting Dependencies

The reusable code should be hosted in either public pypi.org or in the private pypi repo. Artifactory as well as PackageCloud support the latter. The public repo is more convenient because pip by default has access to it. In case of the private repo you would need to configure a extra-index-url in ~/.pip/pip.conf. The pip.conf should be available in all environments that install dependencies – development laptops and servers, building machines etc. This is a one time procedure, so usually not a big deal.

Using Omnibus Packages in CI/CD

CI/CD itself is a big topic, which I will just briefly cover here.

After the code passed all required tests Jenkins (or whatever equivalent) builds the Omnibus package. With the same config Omnibus builds packages for the system it runs on. For example, if you build a package on CentOS 7, it will build an rpm for CentOS 7. If you run Omnibus on Ubuntu Bionic, it will build a deb package for Ubuntu Bionic. That is a tremendous convenience if you organization upgrades OS versions.

The outcome of Omnibus build process is an rpm or deb file – needs to be uploaded to yum/apt repo. And since it’s a package wildly understood by all Linux tools you can base your CD part on it. For example, in Chef you can instruct it to install the latest version:

P.S.

Slides from a BAyPIGgies meetup.

Previous Post Next Post