Log file analysis is underrated.
I believe learning how to utilise log files should be a part of every SEO’s routine website health checks. When I was first learning how to carry out log file analysis, I struggled to understand where to begin. Every other blog was telling me how to set up the software and import the files, but not what to look for and why it’s important.
This article is written as though you’ve already imported some log files into your tool of choice (in my case – Screaming Frog Log File Analyser). You’ve got your data, and you need to know what to do with it.
What is log file analysis?
Log file analysis enables you to observe the precise interactions between your website and Googlebot (as well as other web crawlers such as BingBot). By examining log files, you gain valuable insights that can shape your SEO strategy and address issues related to the crawling and indexing of your web pages.
Why is log file analysis important?
Log file analysis allows you to see exactly what all bots are doing, across all of your content. First-party tools such as Google Search Console or Bing Webmaster Tools provide limited insight into which content is being discovered and crawled by their bots, offering only a tiny fraction of the whole story.
Since crawl budget is limited, it’s important that search engine spiders spend as little time as possible on URLs that have no organic value. We want spiders to concentrate on the pages that you want indexed, crawled and served to potential customers.
Cross-checking with a crawler
You can use a crawler tool such as Screaming Frog SEO Spider to build a clean list of the URLs you want indexing. Feeding them into the project file alongside the log files helps you see:
- Which URLs are being crawled that should be crawled
- Which URLs are not being crawled that should be
- Which URLs are being crawled that definitely should not be
So, what shouldn’t be crawled?
This really varies from site to site. A common culprit is the Indexable search function. Search functions should generally be blocked in robots.txt. There’s rarely any search volume for the random values users enter, and you don’t want those search results pages indexed.
Another is Catalogue filters, like ?colour=red&size=small. These need evaluating based on search volume. If there is no demand, block them to save your crawl budget.
Interesting issues I’ve encountered
Googlebot 403 Errors
A JS site migration once caused Googlebot to be served a 403. The site looked fine to users, but it was invisible to Google. The issue was a misconfigured full-page cache. Once fixed, we saw 200 OK responses again — and pages began indexing properly.
Proxy server masking IPs
On another site, bot traffic appeared unverified. The WAF was replacing Googlebot’s IP with an internal IP. This meant the log data couldn’t prove what was real. We fixed it by whitelisting Google IPs and bypassing the WAF for bots.
FAQ's
Does Screaming Frog SEO Spider include Log File Analysis?
No, the Screaming Frog SEO Spider does not include log file analysis within the main tool. However, there is a separate Log File Analyser which can be used together with the Spider to cross-reference raw server logs.
How do I analyse log files with JetOctopus?
JetOctopus stands out for its ability to handle enterprise-level data without hardware limitations. It offers seamless log file integration via Cloudflare, AWS, or NGINX for real-time monitoring.