Log File Analysis – What is it, Do I Need One, and Why Should I Care?
Posted by Luci Wood on February 7, 2019
Almost all websites have log files recorded and stored somewhere on their server’s backend. However, oftentimes this potential gold mine of information can go ignored, or certainly not be utilised enough.
A log file analysis can explore this data, and its potential SEO implications, to garner a deeper understanding of how different user agents act on a site.
In this article I’m going to talk about log file analysis in more detail, including what they are, why you might want to do one, the trends we can identify, and how we go about conducting one at Blue Array.
What is a log file analysis?
A log file analysis is an investigation of a website’s server logs, with the aim of identifying trends in user agent activity – predominantly search engine bots.
Completing a log file analysis will require your website’s raw access logs for a specified period of time. The time period required for a substantial analysis to be conducted could range from a week to a couple of months, and is usually dependant on the size of the website you’re looking at, as well as the focus of the log file analysis.
Why do a log file analysis?
The primary focus of a log file analysis is the collation of data to identify trends in bot activity.
Log file analysis vs. Google Analytics
“But I can see where my traffic is going in Google Analytics”, you may be saying, “why would I need a log file analysis”?
A log file analysis from an SEO standpoint is to mainly focus on bot traffic, not users. While Google Analytics is fantastic for giving an overview of user traffic, it’s not so great for reviewing bot activity.
The tool we use, Screaming Frog’s Log File Analyser, assimilates log files and displays useful trends (such as response codes over time) visually, in the form of graphs, as well as listing the URLs in easily extractable tables. We typically use it to extract select bot activity from the access logs, a process during which the Log File Analyser can identify and remove spoofed bots from the reports, leaving only real crawl data.
The Log File Analyser also orders the data by user agent, including browser traffic should you wish to include it, and enables filtering by specific user agents.
The in-depth log file analysis we do using this tool also allows us to perform micro analysis on areas of data where necessary, as well as overall trends.
What trends can be identified by conducting a log file analysis?
A log file analysis can help to identify a correlation between any noticeable changes in traffic and bot activity. For example, a large drop in Googlebot activity identified in Search Console could coincide with a increased percentage of 4xx or 5xx responses to Googlebot’s attempts to crawl pages. This sort of insight can be fairly easily gleaned through the analysis.
The Log File Analyser has separate filters for Googlebot Desktop and Smartphone, which can then be used to compare the crawl totals of each. This is particularly useful when trying to identify if a site has moved towards Mobile First indexing – i.e. a Googlebot Smartphone crawl total higher than Googlebot Desktop’s indicates a shift towards a Mobile First Index.
Overall Googlebot activity can also be compared to other search engine crawlers, such as Bing, Baidu or Yandex to help identify algorithm changes – for example, a change in Googlebot’s crawl frequency over the time period where other search engine bots remain unaffected may imply an update to Google’s algorithms.
Additionally, low crawl totals for Googlebot alongside high crawl totals for other search engine bots could imply your SEO tactics aren’t ticking all of Googlebot’s boxes in areas that other bot algorithms don’t check. Comparing theories on bot algorithms may help reveal the difference.
Log file analysis can also be used to highlight URLs with a lower crawl frequency than others on the site. This can be useful in identifying possible issues with discoverability/accessibility of content. For example, if a specific section is not being crawled by Google, it may not be easily accessible to bots (e.g. it may be sat too deep in the architecture of the website and/or not well linked to from other pages).
Another possible reason for a log file analysis is identifying where and how crawl budget is spent. According to Gary Illyes in 2017, the majority of sites likely do not need to be concerned with crawl budget, but using a log file analysis on a large website, such as one operating within ecommerce, can definitely help identify if particular areas of the site are frequently being missed by particular user agents.
How do we conduct a log file analysis?
If we believe a client requires a log file analysis, we will request the raw access logs for the period of time we wish to investigate (as mentioned above, this time frame may differ depending on the client and the purpose for the analysis). We generally ask for over 2 weeks worth of data, in order to get a decent sample of data to work with.
Once we have the files we need, we use the Log File Analyser to open the files, filtering by appropriate User Agents. For a generic log file analysis these would be Googlebot Desktop, Googlebot Smartphone, Bing, Baidu and Yandex.
We use screenshots of the data visualisation in Log File Analyser for an overview of data, and then export Googlebot Desktop and Smartphone URLs and Response Codes for further data manipulation.
From the exported data we’ll investigate the most crawled and least crawled URLs, as well as client errors by subdirectory, for both Googlebot Desktop and Smartphone. We also extract all URLs from header menus and compare them against URLs crawled by Googlebot – for both Desktop and Smartphone, to help identify potential problems with header menu visibility to Googlebots; something which is particularly important in the Mobile First Indexing era.
With the charts we create from the investigation, we build out a detailed slide deck, with descriptions and explanations for data, identifying if and when something is important (for example, a collection of page resources only crawled once during the time period is expected and generally not a cause for concern while, in contrast, important conversion-focused pages being crawled just once during the period would be something to investigate further).
If we are looking to identify a specific problem with a site, we will do this alongside an overall health check. A recent example of this is a client who had experienced a significant drop in traffic. During their log file analysis we performed a general website health check, and then used the areas we identified in that work to focus our log file analysis on whether the areas we uncovered could be the cause of the traffic drop.
As we expected from the health check work, this enabled us to confirm that there were issues with the relationship between the mobile and desktop versions of the site, which was causing issues with indexing of certain content and therefore the traffic losses they were experiencing. We also discovered very low crawl totals for paginated URLs, implying discoverability issues here too.
Is a log file analysis for me?
If you have log files which are too large to scrutinise without data manipulation (we find this to be any site with 10+ pages) an in-depth Log File Analysis is likely worth your time.
The purpose behind a log file analysis tend to be different for each client, ranging from a general health check to pinpointing suspected issues that may be causing a traffic drop. However, seemingly healthy sites will always have areas for improvement, and a log file analysis can help you identify your weakest areas.
Log files are an incredibly valuable resource, and an analysis of them can provide a comprehensive insight into how different user agents act on your site and what can be inferred from this – helping greatly to form strategies to improve overall crawlability.