by John Funge
on 2nd Feb 2017
Estimated reading time: 10 minutes
We described in a previous post how DNS records can reveal patterns and trends in technology adoption. But DNS records are just one way of discovering information about the technology choices companies make. When you go to a website in your browser, the page you see is only part of the information returned. The data also typically contains HTTP header fields that you can view if you use the developer tools in the Chrome web browser (other browsers have similar functionality). HTTP headers often contain information such as the web server the site uses. For example, here is a screenshot from Chrome showing part of the HTTP header for microsoft.com:
From this header, we can tell that Microsoft (unsurprisingly) uses its own Internet Information Services (IIS) web server. For security or privacy reasons, some sites don’t report which web server they use and many others leave out detailed versioning information. That said, most web servers at least have reporting their identity active by default as they want to advertise usage. Many sites use third-party content delivery networks (CDNs) to serve static assets such as images and video but, for the most part, the information about web servers at the origin can be obtained easily. A few sites go as far as to deliberately obfuscate information; for example, those that report using Microsoft’s Personal Web Server from Windows 98 tend to do so as a humorous way to conceal the actual web server they use.
For companies within the S&P 1500, the three most popular web servers are 1) Microsoft’s IIS; 2) Apache, a long-time open source alternative; and 3) Nginx, the latest open source web server, popular with technology startups for its performance under high load. Using historical data from builtwith.com, we can track the relative popularity of these three web servers over time. As an example, if we compare IIS and Nginx for the technology and finance sectors, we see that IIS usage has dropped about 20% in both sectors, but the rise in popularity of Nginx within the technology sector started earlier and has been more pronounced. The graphs are annotated with some of the companies in each sector that switched technology – is this a sign of active technical leadership?
Beyond comparing specific technologies, we’d like a more general measure of technology leadership. Preferably, we’d do this by benchmarking our distribution against some ideal technology choice that represents up-to-date versions of the “best” technology. However, version information is not available typically and deciding which technology is best can be subjective — is the latest release of a software package always the best choice given the potential for new security holes?
To address this issue, we created a proxy benchmark by looking at the distribution of technologies used by the top 100,000 most popular websites as ranked by Quantcast. For each GICS sector in the S&P 1500 (excluding the recently added real estate sector), we constructed a similar distribution and then used the KL divergence to measure the distance between the distributions. This gave us a measure of how close each sector in the S&P 1500 is to the benchmark. We can see from the graph below left that the technology sector was, as expected, quite close to the benchmark, whereas the materials sector was the clear laggard.
We also looked at how the distributions changed over the past year and used Cosine similarity to measure which S&P 1500 sectors are moving toward or away from the benchmark. From the graph above right, we can see that the utilities and telecommunications sectors have moved rapidly toward the benchmark, but materials companies have headed in the opposite direction! The number of companies in a sector is in parenthesis underneath each label, and less confidence should be placed in the data when this number is small.
Google Analytics is used to track website usage and it has such wide adoption that we can use the unique ID assigned to each user to spot websites that presumably have the same owner. That is, if two websites embed the same Google Analytics ID, then we can assume that they are owned by the same entity (although sometimes people accidently copy and paste HTML into their website without realizing!).
If we do this for all S&P 1500 companies, we can see which top-level domain (TLD) names they have registered. Of course, there might be many other domain names that companies register beyond those that they are actively tracking; for example, many companies register variants of their domain name for their key brands to avoid nefarious use of the domain. While the list of additional websites registered doesn’t necessarily say a lot about technical leadership, it does reveal some interesting anecdotes and perhaps some indication about a company’s interest in global markets.
One example is Colgate Palmolive which have registered the colgatepalmolive.com domain name, but also the French sanex.fr domain name. In fact, Colgate have registered all of the following top-level domains: AT AU BE CA CZ DE DK ES EU FI FR GR HR HU IL IT LV NL NO PL PT RU SE UA UK US ZA. The winner in the S&P 1500, however, is Lockheed Martin which appears to have registered a staggering 113 different domain names. The graph above shows all the companies in the S&P 1500, grouped by sector, where each line in the “skyscraper” represents one company and the “window” is on if they have registered a specific top-level domain (such as .com, .net, etc.).
The information about companies described above and in our previous post provide an interesting window into technical leadership at different companies. Of course, this information is not hugely useful in isolation, but combined with other more conventional measures, it enables us to develop new ways to assess a company’s financial performance and is part of Winton’s commitment to developing innovative data sources that support our research.
Tags: dns domains http-headers web-servers