by Morgan Parry
17 December, 2018 - 10 minute read

There are many ways to show how popular a language Python is, but to take two examples: GitHub, in its State of the Octoverse, ranks it as the third most commonly-used language for each of the past four years, while StackOverflow statistics illustrate the 250% growth it has seen over a similar period, predicting this will continue unabated until at least 2020.

Furthermore, despite its existing popularity, there is other evidence to suggest that this rise will continue: GitHub also reports that Python remains the eighth fastest-growing language, undoubtedly in no small part due to its prominent role in machine learning; and StackOverflow surveys for two consecutive years have found that it remains the language that most developers wish they were using.

A new momentum

For much of this decade, it has been the more traditional Python 2 variant that has remained most prevalent. However, as Python 3 now celebrates its 10th anniversary, that balance has at last shifted.

In its formative years, there was simply insufficient motivation to make the switch from Python 2 to 3 for most existing code: the cost of its source code incompatibility outweighed the improvements, which were numerous enough, but often small or not compelling individually. The balance in that trade-off has now flipped - to the point that Python 3 has finally achieved critical mass.

As the end of 2018 approaches, both carrot and stick are driving the adoption of Python 3. In the former category, we have two main aspects: a fully supportive ecosystem, with more than 99% of the most popular packages on PyPI available for Python 3, and a new abundance of desirable or substantial features such as type annotations, asyncio, data classes, and f-strings. Meanwhile, at the opposite end of the motivational spectrum, many major packages have frozen features in their Python 2 packages, or soon will, and more generally there are firm plans to bid Python 2 as a whole a fond farewell in 2020.

Publicly available data suggests that this shift towards Python 3 has been steadily gathering momentum and is perhaps now decisive. At the end of last year, JetBrains used results from their regular surveys to illustrate five years of this trend, declaring that among PyCharm users Python 3 was now significantly ahead.

We can also examine the freely accessible PyPI download statistics in Google BigQuery and see a similar pattern continuing. For example, the graph below shows the past 3 years of history of numpy downloads from PyPI and, in particular, we can see that Python 3 overtook its predecessor earlier in the year by that measure:

(The jump in mid-2018 is due to improvements in their data collection processes; see this FAQ.)

Of course, these are only downloads using pip and it is likely that there are at least as many users obtaining packages via other distribution methods such as conda, which indeed we use internally at Winton. And while numpy underpins most areas of scientific Python development, there are many other classes of Python usage.

However, all of these trends very much appear to be in the same direction and Python 3 usage has caught up with or, in at least some areas, overtaken Python 2. Numerous major companies that - like us - heavily use Python have indicated that they are migrating or have already completed the process. Finally, Python 2 is truly the legacy variant, as the official Python Wiki has long claimed.

Python at Winton

Here at Winton, we have always been keen Python users and over time the language has only grown more ubiquitous and central to our endeavours. We have been incrementally transitioning to Python 3 for a little more than two years - a journey that is now nearing completion. In an enterprise environment like ours, there are two possible approaches:

In practice, a large proportion of our packages were in the latter category, but these strategies are not mutually exclusive: where code has no dependencies, we would naturally chose the former, easier option.

We use conda for almost all of our packaging and distribution needs, as we have described in the past. We maintain multiple internal package repositories, in addition to mirrors of major external ones, and support substantial graphs of package dependencies.

Since we could not expect the leaf nodes of those graphs - i.e. the client code - to immediately switch, or to do so in sync, we chose to gradually expand our Python 3 support, mainly working from the root modules of these dependency graphs outwards to the leaves:

We have found the versioning and dependency management in conda, and the accessible but powerful packaging abilities of conda-build, invaluable in these efforts. As modules have been updated to add Python 3 support, we have been able to simply introduce additional build variants into our Jenkins pipeline builds and publish them. As soon as downstream packages have also been updated, and all of their dependencies satisfied, they can immediately be installed into a fresh Python 3 conda environment and used. (The ability to run multiple environments in parallel can also be a huge help in this process.)

In a sizeable proportion of our packages, we produce many build permutations: up to three versions of Python (2.7, 3.6, 3.7) multiplied by up to three platforms (Windows, Linux, macOS). Again, conda-build absorbs much of the difficulty of cross-platform development for us, but we have also further leveraged our build pipelines’ continuous integration and testing - in combination with good test coverage! - to ensure that all of this hard work to introduce compatibility is not undone in future changes. With so many variations, it is of course impractical for developers to manually test everything.

Caveat programmator

In principle, transition to Python 3 can be quick and automated tools can assist: one-way migration can be performed via 2to3 or concurrent support can be facilitated by futurize.

Unfortunately, though these are always at least useful starting points, for non-trivial code bases they will often only tackle the tip of the porting iceberg and there are a number of areas in which things can be much more complicated or subtle. For concurrent support, the thorniest of these will generally be at the edges where code must account for external factors such as:

Some areas can be simplified by taking advantage of the future or six libraries, or both. Chiefly, the former provides back- and forward-ports of functionality, while the latter provides utility functions for mitigating differences between the two languages. Appropriate usage of these libraries helps to minimise conditional code and, thanks to the back-ports, permits more Python 3 facilities to be utilised.

While the dynamic nature of Python is obviously one of its strengths, and one reason why it is so popular, it can be a hindrance and an additional source of bugs in work of this sort. Fortunately, there are various static analysis tools available that can mitigate this and in the later stages of our Python 3 migration we began to adopt these for our internal packages. There are two main aspects to this:

Explicit typing

Python 3.6 and later feature optional type annotations, allowing the types of function parameters, function return values and variables to be described. These decorations are ignored at runtime, but may be verified using external tools such as mypy (technically still classified as ‘experimental’, but in practice quite ready for use) or PyCharm.

We chose to focus on mypy; the main reason for doing so was that it can easily be applied both in an IDE during development and in test steps. The latter can be especially beneficial when dealing with areas such as preserving existing exposed API: where expected types are stated, mypy will ensure that these promises are upheld even as changes are made towards Python 3 support. In our conda builds this was introduced via just a couple of extra lines in the test section of meta.yaml:

test:
  requires:
    - mypy
  commands:
    - mypy --ignore-missing-imports python/src python/tests

Moreover, annotations are advantageous during development as a whole since, given sufficient type information, many other errors beyond backwards compatibility can be identified and it is also possible for IDEs to offer higher quality auto-completion suggestions and tooltips.

There is a low cost for electing to adopt annotations because this can be done incrementally. Validation will be performed where mypy is informed of the intended types and elsewhere it will assume that typing is still dynamic for the time being and relax its analysis. Furthermore, for existing code it is often possible to automate much of the annotation process via tools such as MonkeyType or PyAnnotate.

Where the intention is to retain Python 2 compatibility for a module for the time being, it is necessary to use a different syntax for the annotations. Under Python 3.6 or later you can write this, for example:

things: List[int] = []

def foo(bar: str, baz: int) -> None:
    pass

(It is probably quite a common reaction that the above is visually rather noisy in comparison to unadorned Python but that can be mitigated with a good IDE syntax colouring scheme. Additionally, mypy will infer most variable types, once functions are annotated, and it is usually only container variables that are initially empty that must be annotated.)

Python 2’s syntax does not allow such constructions and the above must instead be expressed as follows:

things = [] # type: List[int]

def foo(bar, baz): # type: (str, int) -> None
    pass

One issue to consider with this latter approach is that currently there seems not to be any complete tool support for automatically converting comment-based annotations to the Python 3.6 style later on.

However, it is unlikely that mypy will drop support for type comments, since they are also used for other purposes such as temporarily disabling checks and are equally necessary for earlier versions of Python 3, so there should be no particular requirement to update them.

Static code analysis

We use the venerable pylint for both general code analysis and more specifically for detecting cross-version compatibility issues. Typically, the more exhaustive and opinionated the analyser, the greater the tension that exists between minimising false positive noise and conducting checking as thoroughly as possible.

Yet while this can certainly be true of pylint, it can become quite manageable after an initial pass of configuration and clean up. That initial cost may be too large in some codebases, but it is at least worth considering pylint for new modules, and the overhead may not be that great at all if you already intend to follow conventions such as PEP 8. Alternatively, there are other good static analysers available too, such as flake8, which errs towards the conservative end of the spectrum and may offer a better trade-off in some cases.

f.close()

Winton’s migration to Python 3 is nearing completion. Of our internal packages and client code, the vast majority has either switched wholesale to Python 3 or supports both Python 2 and 3. Now that we have fully embraced the future of the language, we are excited about the new opportunities that Python 3’s second decade will bring.