Index Coverage Status report – How to use it to understand and improve your website’s indexation
Posted by Luci Wood on October 4, 2017
GoogleGoogle Search ConsoleSEOSEO StrategyTechnical SEO
Google have been making big changes to Search Console over the last couple of months, including the introduction of a completely revamped Index Coverage Status report.
Webmasters have always been able to access information about the number of webpages that are indexed on Google through Search Console’s Index Status Report. However the new feature (which is currently only offered as a beta to a small set of users) aims to provide a far more detailed dive into the health of your website’s indexation.
More importantly for SEO’s and webmasters alike, the Index Coverage Status report provides information about the pages which are submitted in the sitemaps, yet are not included in Google’s index, as well as providing tips for fixing these issues.
In this article we take a look at the new feature and how it can help you ensure you minimise any indexation issues affecting your website.
Firstly, why exactly is monitoring indexation important?
Indexation is one of the two core components, alongside crawling, for getting your website and webpages listed within the search engine. If your pages aren’t being indexed, then they won’t be appearing in Google’s organic (unpaid) search results. This means that you could be missing out on thousands, if not millions of visitors and subsequent revenue.
Monitoring the status of your indexed pages should therefore be a vital SEO task you carry out on a frequent basis. If you haven’t thought about doing this before, the introduction of this new tool means there’s no better time to start.
What to expect when logging in to the Index Coverage Report
The updated Index Coverage Report is far more visual than the previous offering. When logging into the report, you’re instantly presented with a chart showing how the number of indexed pages has changed over the last 90 days.
It’s important to note at this stage that the number of indexed pages shown in the updated tool will differ to the classic Search Console, if you’re moving from one to the other. In general, the new report will show much lower numbers of indexed pages due to refinements in Google’s data.
The new look Index Coverage Report:
How the Index Status report looked previously:
Each bar chart (representing the status on each tracked day) is coloured as such:
- Valid pages (in green)
- Pages with warnings (in yellow)
- Pages with errors (in red)
Further details into each status is then provided below. Google recommends tackling the red ‘Error’ issues first and foremost, as these contain the most significant problems to getting your pages indexed.
List of indexed/not indexed pages and number of URLs affected:
Other handy features included within the tool:
- The number of impressions that the website receives on any particular day can be highlighted by ticking the ‘Impressions’ check box within the chart, allowing you to track whether an increase in indexed pages has lead to greater organic visibility (or vice versa).
The indexation status chart including impressions recorded on each day:
2) Each of the index statuses can be clicked on and off, allowing you to focus on single or multiple areas. This is particularly handy if you want to check for spikes in errors around any particular date.
The indexation status chart with only warnings and errors highlighted:
The indexation status chart with only errors highlighted:
3) You can filter the data by sitemaps submitted within Google Search Console. You have the option to view all URLs submitted within your sitemaps, or break this down even further to only view pages submitted within individual sitemaps. This can help you determine if certain sitemaps are being particularly problematic.
The indexation status chart filtered to show only URLs submitted in sitemaps:
The indexation status chart filtered to show URLs submitted in a specific sitemap:
4) Clicking on a specific error row takes you a list of URLs affected by that particular issue. Combining this with selecting a particular sitemap allows you to see if this is an issue that has developed over time, and provides you with a list of pages to focus on fixing, or removing from the sitemap altogether (e.g. if a URL has intentionally been ‘noindexed’).
Highlighting a specific issue in an individually submitted sitemap:
Actions you can take to improve indexation rate by using the tool:
1) Check the red ‘errors’ before anything else.
‘Errors’ include URLs that Google has discovered, either through your submitted sitemaps or by other means (e.g. external links) that are not currently indexed. It’s important to work through these first in order to clear up the most significant problems.
Common ‘errors’ highlighted (and suggestions to fix them) include:
URLs marked as noindex but submitted in a sitemap – assess whether these pages should include this markup. If you don’t want the URLs to be indexed then they should not be in the sitemap. If you do want them to be indexed then removing the markup should help resolve the issue.
Redirect error – this is where the submitted URL results in a long redirect chain/loop, resulting in a timeout. Ensure that your sitemap only includes URLs that contain a 200 OK HTTP status code.
Submitted URL blocked by robots.txt – this is where you have submitted a URL that has been disallowed within the site’s robots.txt. Assess whether you want this to be included in Google’s index and act accordingly (either remove it from the robots file or the relevant sitemap)
Submitted URL not found (404) – this is where a URL that does not exist has been included in the sitemap. Such errors should be removed, or the URL updated to include the correct/most up-to-date URL.
2) Yellow ‘warnings’ are the next priority
The warnings marked in yellow contain many similar issues as those found in ‘errors’, though are deemed to be less severe. They often relate to previously indexed URLs, where the status has been recently updated. This could include:
Indexed, though blocked by robots.txt – this is where the URL has been included in Google’s index despite its inclusion in the site’s robots.txt. Check these URLs to see whether you want them included in the robots file. If not then using the noindex tag is the recommended method by Google to ensure de-indexation.
Indexed, now blocked by robots.txt – the URL had previously been indexed, though during Google’s most recent crawl it was found to be blocked by robots.txt, indicating it is likely to be dropped from the index. Review these to ensure removal from the index was the intention (again, these may be served better by the noindex tag).
Indexed URL now marked ‘noindex’ – these are pages which were previously indexed fine, but on Google’s last crawl were found to include a noindex tag, resulting in them being dropped. Assess whether this should be the case and remove the tag should you wish for the URL to be included in the search engine’s index again.
Indexed URL now not found (404) – the URL was previously indexed, but was found to no longer exist on Google’s last crawl of the website. This may cause a poor experience for users clicking through from the SERPs. Using the Remove URLs tool can help to speed up the process of getting it removed from the search results altogether.
3) Being marked green (indexed) doesn’t necessarily mean all is good
Whilst the green, ‘valid pages’ show URLs on the website which are currently indexed, it doesn’t mean that this part of the report should be ignored.
There are a number of potential issues which should be investigated further within this section of the report, including:
Indexed, not submitted in sitemap – Google strongly recommends including all important URLs in an XML sitemap. Assess these URLs to determine whether they should be added to a sitemap. Alternatively, if you do not want the URL to be indexed then add a noindex tag to it.
Indexed, consider marking as canonical – the URL has been indexed but appears to be a duplicate of other pages found on the website. For this reason, marking the URL as the canonical will tell Google that it’s the primary version for indexation purposes.
Indexed, low interest – whilst the URL has been indexed, Google considers this an unimportant page due to how infrequently it surfaces in its search results. It’s worth assessing these URLs for the value they add to the website. If truly unimportant then noindexing these can help focus Google’s efforts to the more valuable pages.
4) Check out the Informational/Excluded list
URLs marked as ‘Informational’ or ‘Excluded’ (coloured grey) are not given such prominence in the report, as Google believes that they are pages that are intentionally not being indexed (i.e. the webmaster has specified for them not to be).
However, this part of the report is worth checking over to ensure there aren’t important pages on your website that should be indexed but aren’t. Examples of this could include:
- Pages blocked through the use of the noindex tag
- Pages blocked through the use of a URL removal request
- The URL submitted in the sitemap is not selected as canonical
- The page has been discovered by Google but its index has not yet updated
A full guide on how to use the Index Coverage report can be found on Google’s Webmaster website, at https://support.google.com/webmasters/answer/7440203.
Can this help beyond Google?
The error reports are also important to fix issues with indexation in Bing, and possibly other search engines too. Duane Forrester, who was formerly a Senior Product Manager with Bing said “Your Sitemaps need to be clean. We have a 1% allowance for dirt in a sitemap. Examples of dirt are if we click on a URL and we see a redirect, a 404 or a 500 code. If we see more than a 1% level of dirt, we begin losing trust in the Sitemap”.
Conversely however Google have said they don’t have any issue with sitemap ‘dirt’:
https://twitter.com/JohnMu/status/817097274631327744
What’s next for the Index Coverage Status Report?
Overall, the new Index Coverage Status Report seems like a great addition to the SEO/Webmasters arsenal of tools.
Whilst still in beta for the foreseeable, we can definitely expect the feature to be made widely available within Search Console as Google starts to gather feedback.
If you’re lucky enough to have been invited to take part in the beta test, make sure you try to take full advantage of the features, and submit your feedback with any suggestions directly to Google.
Thanks for the post. I recently got the “Indexed, though blocked by robots.txt” text and I have no clue why. I’m trying to figure out how to fix it.
Thanks for your comment Brant. I’m presuming that the URL isn’t blocked by your robots.txt file. Is it a URL that is important for your website?
I am getting “Submitted URL seems to be a Soft 404” for 4 pages of my sitemap, I tried to “FETCH AS GOOGLE” but error doesn’t solved yet. How can get index for those pages.