17 Feb Analysis of 100’000 Top WordPress Sites
Analysis of the top WordPress sites provides us with insight into the technologies and security posture of these high traffic properties. With the popularity of WordPress well understood it is interesting to dig into the statistics and its usage within high traffic sites.
Poor security patching is a problem across all aspects of information technology. WordPress administrators are not the only ones struggling to keep things patched, in May 2019 the Baltimore city council servers were taken out in a ransomware attack. Even the phone in your pocket needs to be patched, with reports estimating that over a billion Android phones are missing security patches.
Automattic and WordPress have worked hard to make keeping things updated a smooth and easy process. When your software runs on 30% of the worlds websites patch management is important.
Keep in mind that for many WordPress sites, there is no full time IT administrator. Almost anyone can get a WordPress site running, its the ongoing patching and management that many struggle with. This has led to growth in Managed WordPress hosting and services.
CMS Detection Methodology
The methodology used to determine the underlying technology of web sites is to search for specific strings within the HTML or the HTTP Headers provided by the web server. For WordPress our process is a simple matter of downloading the headers and page source from all sites in the Alexa top 1 million sites. The resulting content was then searched for /wp-json/, /wp-includes/ or /wp-content/ indicating a WordPress powered site.
No guarantee is made to the accuracy of this data, the accuracy comes down to what we found in the source.
CMS Usage in the Top 1 Million Sites
Here we compare WordPress against its rival content management systems. It is clear to see that WordPress is well out in front in 2019.
The popularity of WordPress gets quoted in everything from marketing materials to security incident reports. It is nice to see that the often quoted 30% figure is close even when counting the worlds highest traffic sites.
Web Servers of the Top 100K WordPress Sites
These statistics are based on the front end web server that is delivering the WordPress site to the browser. The results are based on the initial HTTP header (Server:). In the following chart the total number for the web server technology is the focus.
Keep in mind that the front end servers powering Cloudflare are Nginx based and the growing openresty is also built on Nginx. Putting Nginx well out in front as the technology of choice serving the page to the browser. No doubt one of the reasons it was recently acquired by F5 networks.
More than a handful of sites are running on Microsoft based IIS servers (1275), included in this number is WordPress powered Microsoft Corporation properties such as Visual Studio.
A closer look at the CloudFlare statistics
CloudFlare continues to be very popular among WordPress administrators with 21.6% of the worlds top 100K WordPress sites being served by CloudFlare on front end.
In this breakdown of the WordPress sites being served by cloudflare sites, we can see that CloudFlare has grown by a couple of percent since our last analysis we performed in 2017.
Don’t forget your PHP Upgrades
The latest update to WordPress Core checks the PHP version, and will fail if the minimum PHP 5.6.20 is not running. This is interesting when we look at the PHP version within use in the top WordPress sites.
In the HTTP Header responses we found the PHP version leaking in 28729 sites (28.7%) of the top 100’000. This was found in the X-Powered-By header or in the extended Apache Server Header. The end of life chart shows the percentage of sites within the 28.7% where the version was leaked.
Keep in mind that anything before PHP/7.1 is End of Life and not supported at all from the PHP project – even for critical security patches.
Analysis of installed WordPress Core Version
Looking into the WordPress version goes hand in hand with understanding the security posture of a site. Since the release of WordPress 3.7 automatic updates have been available for WordPress installations.
WordPress Security recommends the need to always run the latest version of WordPress core to ensure security fixes are applied.
There are different ways to determine the version Check out our guide on Attacking WordPress Sites test of a WordPress installation. For simplicity only sites with the default Meta Generator banner are included in this break down of versions found. The default generator tag was found on 60009 of the top 100K WordPress sites.
Quite a spread of versions can be seen! Those WordPress 2.x sites really do exist (WordPress 3.0 was released June 2010). There are currently 527 sites running 2.x and 616 sites running WordPress 3.x. This is about 15% less than 2017, so thankfully there are no new 2.x or 3.x installations!
Just over a third of all the sites are running the latest version 5.2.1 (this was latest version at time of analysis – 3rd June 2019). Version 5.2.1 had been out for 2 weeks at this time.
Only 37.2% of these high traffic sites are running the latest version (2 weeks after release).
This indicates a lack of standard maintenance procedures on the majority of sites. Administrators still need to improve adoption of best practice security maintenance processes.
WordPress Hosting Providers
Crunching the numbers for the hosting of the WordPress sites, we simply resolved the IP address of the site. From the IP address the network block owner was determined by running a simple ASN lookup. The results show the owner of the hosting net block which is often the hosting provider. Note that some hosting companies may not own the IP block, in these cases large networks such as Amazon (AWS) and Google (GCP) will include smaller hosting companies.
Managed WordPress Hosting
While the ASN’s listed above show the locations of the sites within network blocks, there are also managed WordPress hosting providers whose services sit within some of these ASN’s.
For example the statistics for the Google ASN include the managed hosting provider Kinsta who utilizes Google Cloud for their services.
The data for these managed hosting providers has been pulled from HTTP headers, where clues exist in the server header or other custom headers.
Everyone loves a good map. Utilizing the Maxmind GeoLite data the IP address locations were plotted against the list of 100’000 top WordPress sites. As you can see there are either a few sites running on submarines in the Indian Ocean or the IP Geolocation data is not 100% accurate. The general distribution of sites around the world is interesting, with expected clusters in the data centres within the USA and Europe.
Using passive scan data from Internet wide scanning data sets we can correlate with our list of WordPress sites and determine common network services.
Interesting to see that nearly 10% of the top sites are running SSH on port 2222 or 22222.
It seems server owners do not like SSH password bots smashing away all day and night and filling their log files.
Are 36% of the top 100000 WordPress sites updating files using the unencrypted FTP protocol? Let’s hope not. It is of course possible to use FTP over TLS/SSL and this can be configured to work over port 21, so lets just hope all those high value sites are using the encrypted communication.
IPv6 Adoption in the Top WordPress Sites
The rollout of IPv6 continues to crawl at a slow pace in most parts of the world. This is evident by the fact that only 23.6% of the worlds highest traffic WordPress installation have IPv6 enabled on server.
Google has statistics indicating they are seeing 29% of traffic being IPv6 globally, maybe its time that web site owners jumped on the IPv6 wagon.
WordPress Plugin and Theme Analysis
Analysis of WordPress plugins is limited to those that are detectable through passive analysis. In this instance passive analysis is through examination of a regular web request and parsing the HTML and HTTP headers. More aggressive plugin detection can be achieved through brute forcing plugin paths Check out our guide on Attacking WordPress Sites, however this generates thousands of web requests and is only used by malicious actors and vulnerability scanning tools.
When it comes to improving the SEO of a WordPress site, there are two plugins that come to mind; 1. WordPress SEO by Yoast and 2. All in One SEO. The nice thing about these plugins is they put a comment in the HTML source allowing it to be identified. Recently a new contender has entered the scene – SEO Framework. According to the stats it has plenty of ground to cover to catch up.
Compared to 2017 Yoast SEO has really hit the accelerator now with 82% of the install base (of sites running an SEO plugin).
We can see that of the 37205 sites running Yoast, 5958 of these are running the Yoast Premium Plugin. That’s 6% of the top 100K WordPress sites on Yoast Premium. Well done guys.
Identification was performed by checking for the plugins default comment. Of course it is possible that some sites have removed the comment.
WordPress Caching Plugin Showdown
Fast sites make users happy and also make Google happy after the update to the search algorithm that takes site speed into account. Understandably these factors make WordPress Caching Plugins a popular choice for most serious sites.
The most popular caching plugins include comments in the HTML (by default) identifying the plugin in use. By searching for these comments it was possible to gather numbers for the most popular caching plugins.
Top 25 WordPress Plugins
The numbers become a bit rougher when determining the plugins in use. Unless the plugin has a default comment in the code such as the SEO plugins and caching plugins, it gets a bit harder to determine plugins in use.
Many plugins load resources from the plugin folder (css or js), and this is the best way to identify plugins used passively.
Top 25 WordPress Themes
Using similar methodology as the above plugin identification we were able to identify the WordPress theme in use. Searching for the path /wp-content/themes/$theme/ in HTML and counting the most common occurrences. Many sites will use custom themes, and have changed the path, however identification of the most common should be fairly accurate using the large sample size.
It is interesting to note that even the default themes (twentysixteen, twentyseventeen) that ship with WordPress make an appearance in the list. Showing that a flashy theme does not make the site, content matters.
Article first published in 2012. Most recently updated June 2019.