by Robert Wall
on 16th Sep 2016
Estimated reading time: 10 minutes
This may be the first Python related post you see on the Winton Technology blog but it is unlikely to be the last. Here at Winton we use Python extensively across the full life cycle of our business including research, development and our live trading systems. If you would like to know more about Winton’s use of Python across its Technology department I’d encourage you to look at the report of our presence at EuroPython 2016 where Winton was a sponsor.
Python has worked well for us in Winton but packaging in Python has not always been its strongest point. This post discusses the conda packaging format and how we leverage it.
To understand the present it is useful to briefly look at the past. When Python was first introduced, and for some time afterwards, we relied on Python downloaded from python.org. This was a fine choice to begin with and while the number of people using Python was relatively small and the code base wasn’t overly large, this was workable. Packages were installed ad hoc by developers and dependencies were managed and communicated through changelogs and wiki pages. At this point, I’d like to thank Christophe Gohlke who has for a long time made builds of many of the more complex scientific Python stack available for Windows installations that we used at the time.
As the use of Python increased and extended across the business from technology towards researchers, there was a move to use Enthought Python Distribution (aka EPD and now called Enthought Canopy). Enthought had their own package management system, enstaller. Enthought make a fine Python distribution and EPD undoubtedly helped us on our journey towards more consistent environments and research reproducibility.
Our most recent and last move was to Continuum's Anaconda Python distribution.
Conda is a package manager that is used by Continuum's Anaconda Python distribution (Anaconda itself should not be confused with the installation program used by Fedora, Red Hat Linux and some other distributions).
Conda is a generalised package manager and can package other languages and arbitrary code and data which can be installed into a conda managed environment. This allows Continuum to split some binary dependencies like openssl and MKL from the Python code and update these independently. It is not hard to imagine that we may split some parts of our core code similarly in particular if we begin to use conda for other languages.
A single conda package can contain binaries specific to a single platform; other supported platforms are built as separate packages.
As an investment manager driven by science and technology, the scientific computing stack (in particular numpy, scipy, matplotlib and pandas) are key packages. All of these popular packages have compiled elements. Installing them with pip would require the user to build the C source code. This is a big undertaking for less technical users, but the task can be time consuming and distracting even for developers. Conda packages free us from the compilation step, which makes life easier for users, helps keep environments consistent and simplifies deployments.
Conda comes bundled with the Anaconda distribution but conda is open source. You can even do “pip install conda” to install conda into another python distribution — this is not something we have used though.
Today we are using the Anaconda distribution — for the most part on a Windows but also on Linux too. We moved to Anaconda a few years ago and have been happy with it so far. Anaconda comes with an extensive library of packages including the scientific computing stack which has been linked with Intel MKL for additional performance. The key for our move was the simple package management provided by conda. The steps in a conda package life cycle for us include:
Conda packages come with strong dependency management. Developers can be very specific about what package versions are/are not acceptable. Dependencies are included in the recipe with a simple syntax for matching specific package versions. Runtime dependencies are separated from test dependencies and selectors can be used to discriminate further, for example between platforms or specific Python versions.
Internally conda uses a SAT solver to ensure that the combination of packages installed into an environment at any time remain consistent according to the package requirements of each of the installed packages. This helps to ensure that the mix of packages installed at any moment should be compatible.
The package recipe is an encapsulation of the package content. Source location, dependencies and test modules are all declared here and form the basis for building the package.
We write our own recipes for our proprietary packages and Continuum have now started conda forge where recipes for many popular open source projects are collected, built tested and published to anaconda.org. conda skeleton can be used to generate recipes for conda packages from PyPI packages so it is very easy to package additional 3rd party packages in conda if you so wish.
Packages are built according to their recipe but the core of the build process falls back on calling setup.py and the usual distutils/setuptools libraries that have underpinned Python builds for a long time. Conda then takes care of moving the output of these builds into a single installable package.
Testing is an integral part of any software development workflow and we take it seriously. Our conda tests are run independently of the build steps allowing the tests. Our recipes contain details of our tests and when the conda package is run in test mode, a clean environment with nothing but the Python executable is created. The package will be installed here along with its required runtime and test dependencies. Tests are then run in this isolated environment, ensuring that the full package installation, dependency resolution and dependency installation as well as our code is tested, thereby giving increased confidence that when deployed to production or a researcher's VM the result will be the same and the chances of failure are minimised.
Having build a package containing our code we must now make it available to others and for deployment to the trading pipeline. The delivery medium are known as 'channels' in conda; publishing is simply a matter of copying the package to its destination and running a conda index on the folder. Channels can be simple directories, a http server or use the full Anaconda repository product sold by Continuum. To date we have been happy with simple http channels. In the future we may re-evaluate Continuum’s Anaconda repository for distributing packages.
Depending on whether a developer is collaborating with a researcher or working on a library used in the trading pipeline, the set of packages required can vary hugely. We make heavy use of conda environments to ensure that only a minimal set of packages are installed for each operating environment. Conda environments are very similar to Python virtual environments. In both cases, the Python executable and all dependencies are copied into independent folders and when "activated" the shell PATH is adjusted to ensure Python is loaded from the correct environment. Conda environments are entirely compatible with pip installs too so if you find a package that has no recipe you can still pip install the package without problems.
If we do want to reproduce an environment across machines (for example, UAT and live) or if two researchers are collaborating and we want to ensure that both have the same packages installed, conda makes it easy to export the full set of packages installed on a machine and to recreate the environment on the target machine.
The Anaconda distribution comes in two flavours. The root environment (the default environment) comes with either Python 2 or 3. However, in both distributions, conda environments can be created with a either Python 2 or 3 executables. This is a very useful feature in our transition from 2 to 3. Users can continue having a single Python distribution installed but switch easily and naturally between environments.
There is one minor caveat though. Although users can get by with a single distribution, if you want to build packages for both Python 2 and Python 3 then you must have both distributions installed. This is because conda-build, the package for building packages must be installed in the top level site packages folder, also know as the root environment in Anaconda.
Python use within Winton continues to grow and expand. We have been very happy with the Anaconda distribution. In particular conda packaging has brought order and consistency to packaging and distribution. Robust Python environments lead to fewer support calls to developers and is essential for the investment management business where lost time really does equal money in a very tangible way.
Of course, the Python ecosystem does not stay still and neither does Winton. As we migrate increasingly towards a Python 3 codebase we will continue to assess new developments and question old assumptions. The pip/wheel ecosystem is developing and improving dramatically and is seeing increasing praise. As it stands though, Anaconda Python along with the conda packaging system continue to impress and perform well and we don’t currently anticipate moving away.