There are many ways to show how popular a language Python is, but to take two examples: GitHub, in its State of the Octoverse, ranks it as the third most commonly-used language for each of the past four years, while StackOverflow statistics illustrate the 250% growth it has seen over a similar period, predicting this will continue unabated until at least 2020.
Furthermore, despite its existing popularity, there is other evidence to suggest that this rise will continue: GitHub also reports that Python remains the eighth fastest-growing language, undoubtedly in no small part due to its prominent role in machine learning; and StackOverflow surveys for two consecutive years have found that it remains the language that most developers wish they were using.
A new momentum
For much of this decade, it has been the more traditional Python 2 variant that has remained most prevalent. However, as Python 3 now celebrates its 10th anniversary, that balance has at last shifted.
In its formative years, there was simply insufficient motivation to make the switch from Python 2 to 3 for most existing code: the cost of its source code incompatibility outweighed the improvements, which were numerous enough, but often small or not compelling individually. The balance in that trade-off has now flipped - to the point that Python 3 has finally achieved critical mass.
As the end of 2018 approaches, both carrot and stick are driving the adoption of Python 3. In the former category, we have two main aspects: a fully supportive ecosystem, with more than 99% of the most popular packages on PyPI available for Python 3, and a new abundance of desirable or substantial features such as type annotations, asyncio, data classes, and f-strings. Meanwhile, at the opposite end of the motivational spectrum, many major packages have frozen features in their Python 2 packages, or soon will, and more generally there are firm plans to bid Python 2 as a whole a fond farewell in 2020.
Publicly available data suggests that this shift towards Python 3 has been steadily gathering momentum and is perhaps now decisive. At the end of last year, JetBrains used results from their regular surveys to illustrate five years of this trend, declaring that among PyCharm users Python 3 was now significantly ahead.
We can also examine the freely accessible PyPI download statistics in Google BigQuery and see a similar pattern continuing. For example, the graph below shows the past 3 years of history of
numpy downloads from PyPI and, in particular, we can see that Python 3 overtook its predecessor earlier in the year by that measure:
(The jump in mid-2018 is due to improvements in their data collection processes; see this FAQ.)
Of course, these are only downloads using
pip and it is likely that there are at least as many users obtaining packages via other distribution methods such as conda, which indeed we use internally at Winton. And while
numpy underpins most areas of scientific Python development, there are many other classes of Python usage.
However, all of these trends very much appear to be in the same direction and Python 3 usage has caught up with or, in at least some areas, overtaken Python 2. Numerous major companies that - like us - heavily use Python have indicated that they are migrating or have already completed the process. Finally, Python 2 is truly the legacy variant, as the official Python Wiki has long claimed.
Python at Winton
Here at Winton, we have always been keen Python users and over time the language has only grown more ubiquitous and central to our endeavours. We have been incrementally transitioning to Python 3 for a little more than two years - a journey that is now nearing completion. In an enterprise environment like ours, there are two possible approaches:
Hard switch: for the simple scenario of a collection of standalone pieces of code, a one-way conversion will typically be preferable; it will almost certainly be more straightforward. It also permits immediate adoption of features that are unique to Python 3.
Concurrent support: in more complex ecosystems, where packages have external dependencies, support for both Python versions will usually be a necessity, at least for an initial transition period. This can be more difficult and time-consuming, and also means that it is largely only possible to use the common subset of features of Python 2 and 3.
In practice, a large proportion of our packages were in the latter category, but these strategies are not mutually exclusive: where code has no dependencies, we would naturally chose the former, easier option.
conda for almost all of our packaging and distribution needs, as we have described in the past. We maintain multiple internal package repositories, in addition to mirrors of major external ones, and support substantial graphs of package dependencies.
Since we could not expect the leaf nodes of those graphs - i.e. the client code - to immediately switch, or to do so in sync, we chose to gradually expand our Python 3 support, mainly working from the root modules of these dependency graphs outwards to the leaves:
We have found the versioning and dependency management in
conda, and the accessible but powerful packaging abilities of
conda-build, invaluable in these efforts. As modules have been updated to add Python 3 support, we have been able to simply introduce additional build variants into our Jenkins pipeline builds and publish them. As soon as downstream packages have also been updated, and all of their dependencies satisfied, they can immediately be installed into a fresh Python 3
conda environment and used. (The ability to run multiple environments in parallel can also be a huge help in this process.)
In a sizeable proportion of our packages, we produce many build permutations: up to three versions of Python (2.7, 3.6, 3.7) multiplied by up to three platforms (Windows, Linux, macOS). Again,
conda-build absorbs much of the difficulty of cross-platform development for us, but we have also further leveraged our build pipelines’ continuous integration and testing - in combination with good test coverage! - to ensure that all of this hard work to introduce compatibility is not undone in future changes. With so many variations, it is of course impractical for developers to manually test everything.
Unfortunately, though these are always at least useful starting points, for non-trivial code bases they will often only tackle the tip of the porting iceberg and there are a number of areas in which things can be much more complicated or subtle. For concurrent support, the thorniest of these will generally be at the edges where code must account for external factors such as:
Client code that would be surprised by, for example, return values that have quietly mutated from lists to iterators.
IO interaction with files and network sockets, typically due to difficulties with managing string representations and ensuring that everything is getting the kind that it wants. (Strings are often a further complication for client code too, where a Python 2 package will likely still want to work with byte strings, but a Python 3 client will be speaking Unicode.)
Some areas can be simplified by taking advantage of the future or six libraries, or both. Chiefly, the former provides back- and forward-ports of functionality, while the latter provides utility functions for mitigating differences between the two languages. Appropriate usage of these libraries helps to minimise conditional code and, thanks to the back-ports, permits more Python 3 facilities to be utilised.
While the dynamic nature of Python is obviously one of its strengths, and one reason why it is so popular, it can be a hindrance and an additional source of bugs in work of this sort. Fortunately, there are various static analysis tools available that can mitigate this and in the later stages of our Python 3 migration we began to adopt these for our internal packages. There are two main aspects to this:
Python 3.6 and later feature optional type annotations, allowing the types of function parameters, function return values and variables to be described. These decorations are ignored at runtime, but may be verified using external tools such as mypy (technically still classified as ‘experimental’, but in practice quite ready for use) or PyCharm.
We chose to focus on
mypy; the main reason for doing so was that it can easily be applied both in an IDE during development and in test steps. The latter can be especially beneficial when dealing with areas such as preserving existing exposed API: where expected types are stated,
mypy will ensure that these promises are upheld even as changes are made towards Python 3 support. In our
conda builds this was introduced via just a couple of extra lines in the
test section of
test: requires: - mypy commands: - mypy --ignore-missing-imports python/src python/tests
Moreover, annotations are advantageous during development as a whole since, given sufficient type information, many other errors beyond backwards compatibility can be identified and it is also possible for IDEs to offer higher quality auto-completion suggestions and tooltips.
There is a low cost for electing to adopt annotations because this can be done incrementally. Validation will be performed where
mypy is informed of the intended types and elsewhere it will assume that typing is still dynamic for the time being and relax its analysis. Furthermore, for existing code it is often possible to automate much of the annotation process via tools such as MonkeyType or PyAnnotate.
Where the intention is to retain Python 2 compatibility for a module for the time being, it is necessary to use a different syntax for the annotations. Under Python 3.6 or later you can write this, for example:
things: List[int] =  def foo(bar: str, baz: int) -> None: pass
(It is probably quite a common reaction that the above is visually rather noisy in comparison to unadorned Python but that can be mitigated with a good IDE syntax colouring scheme. Additionally,
mypy will infer most variable types, once functions are annotated, and it is usually only container variables that are initially empty that must be annotated.)
Python 2’s syntax does not allow such constructions and the above must instead be expressed as follows:
things =  # type: List[int] def foo(bar, baz): # type: (str, int) -> None pass
One issue to consider with this latter approach is that currently there seems not to be any complete tool support for automatically converting comment-based annotations to the Python 3.6 style later on.
However, it is unlikely that
mypy will drop support for type comments, since they are also used for other purposes such as temporarily disabling checks and are equally necessary for earlier versions of Python 3, so there should be no particular requirement to update them.
Static code analysis
We use the venerable pylint for both general code analysis and more specifically for detecting cross-version compatibility issues. Typically, the more exhaustive and opinionated the analyser, the greater the tension that exists between minimising false positive noise and conducting checking as thoroughly as possible.
Yet while this can certainly be true of
pylint, it can become quite manageable after an initial pass of configuration and clean up. That initial cost may be too large in some codebases, but it is at least worth considering
pylint for new modules, and the overhead may not be that great at all if you already intend to follow conventions such as PEP 8. Alternatively, there are other good static analysers available too, such as flake8, which errs towards the conservative end of the spectrum and may offer a better trade-off in some cases.
Winton’s migration to Python 3 is nearing completion. Of our internal packages and client code, the vast majority has either switched wholesale to Python 3 or supports both Python 2 and 3. Now that we have fully embraced the future of the language, we are excited about the new opportunities that Python 3’s second decade will bring.