Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
-
A page on our WordPress powered website has had an error message thrown up in GSC to say it is included in the sitemap but set to 'noindex'. The page has also been removed from Google's search results.
Page is https://www.onlinemortgageadvisor.co.uk/bad-credit-mortgages/how-to-get-a-mortgage-with-bad-credit/
Looking at the page code, plus using Screaming Frog and Ahrefs crawlers, the page is very clearly still set to 'index'. The SEO plugin we use has not been changed to 'noindex' the page.
I have asked for it to be reindexed via GSC but I'm concerned why Google thinks this page was asked to be noindexed.
Can anyone help with this one? Has anyone seen this before, been hit with this recently, got any advice...?
-
@effectdigital and @jasongmcmahon did you ever get to the bottom of this and if so what caused it and what was the long term fix, as GSC and Google seem to behaving in a peculiar way?
We had a similar issue with this page: https://www.simplyadverse.co.uk/bad-credit-mortgage, but after several cache clears and re-indexing/fix requests it indexed fine.
We now have a page on another similar site that is stubbornly refusing to index. Its a new site and other than the a simple domain homepage, all pages when under development had "noindex " on them.
Several pages on the site on launch behaved like this with GSC saying the page was marked as "noindex" but submitted in the sitemap, but when you check to see if indexing was possible GSC says its fine (we'd removed noindex and setup the sitemap) . All crawling tools say its fine, but this page wont index despite repeated attempts over a couple of weeks, all other pages are now fine, but this page won't index: https://simplysl.co.uk/buy-to-let/
Other than they're all mortgage related sites/pages, I can't fathom why one page would be troublesome and all others index OK despite having the same setup and indexing process, any ideas?
-
Thanks, I'll take a look
-
Thanks for going into so much detail, much appreciated.
We've asked Google to reindex it and 'validate the fix', even though we can't find anything to fix!
-
Hi there, check that caching isn; the issues at server & CMS levels. Other than that reindex the page via GSC
-
This is really weird. Really really weird!
As you say, your site's source code seems to confirm that it is set to index. If we look here, we can plainly see that the coding syntax for a no-index directive is "noindex" (all one word).
Let's look at your source code:
Yep, everything seems fine there! But what if a script is modifying your source code and including the directive - and Google's picking up on that?
If we look at the modified source code which I rendered and saved to a file here:
... we can see, there are no problems here either:
Wow - that's really unhelpful!
Let's see what happens if we specifically search Google's live index for the URL:
Interestingly, when we search Google's index for this page, we get this page returned instead.
It makes sense that Google would return that URL if it couldn't return the main URL, as one is nested inside of the other. If everything was healthy, we'd see Google listing both URLs instead of just one of them. Even if you edit my index query to remove the trailing slash, you still only get the nested URL (not the one you want to be showing, which is at a slightly higher-up level)
Another thought I had was, hmm maybe this is a canonical tag gone rogue. That bore no fruit either, as this page (which you want to index, yet won't) canonicals to this page - and both of those URLs are exactly the same. As such, it's obvious that we can't blame the canonical tag either! I even viewed the modified source to see if it got altered, no dice (the canonical tag is just fine)
Maybe the XML file is telling Google not to index the URL?
Nope - that's fine too! No problems there...
Could the robots.txt file be interfering?
No! Darn it, that's not the problem
I know that a no-index or blocking directive can also be sent through the HTTP header (usually via X-robots). Let's check the response header of your URL out:
Nothing there that really raises my eyebrow. This is enabled and set to block, but to be honest that shouldn't affect Google's crawling at all. Anyone correct me if I am wrong, but defending your site against cross-site scripting (XSS) attacks doesn't impede crawling right?
Fudge it. Let's fling it through Google's Page-Speed Insights tool. Usually that will tell you if something is being blocked and why...
Nothing useful still!
Google's mobile friendly tool gives us some, semi-interesting information:
But it doesn't say the page can't be loaded. It only says some resources which the page pulls in can't be loaded! And guess what? They're all external things on other websites (other than a few theme related bits, but nothing IMO that should stop the whole page loading).
Let's try DeepCrawl's indexability checker (they make amazing software by the way... expensive though):
Sir... there is NO GOOD REASON why your URL shouldn't be indexed. I am 99.9% certain you have encountered a legit Google bug. Post about it here. Only Google can help you at this juncture
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Purpose of static index.html pages?
Hi All, I am fairly new to the technical side of SEO and was hoping y'all could help me better understand the purpose of dynamic rendering with index.html pages and any implications they might hold for SEO. I work to support an eComm site that includes a subdomain for its product pages: products.examplesite.com. I recently learned from one of our developers that there are actually two sets of product pages - a set of pages that he terms "reactive," that are present on our site, that only display content when a user clicks through to them and are not retrievable by search engines. And then a second set of static pages that were created just for search engines and end in .index.html. So, for example: https://products.examplesite.com/product-1/ AND https://products.examplesite.com/product-1/index.html I am confused as to what specifically the index.html pages are doing to support indexation, as they do not show up in Google Site searches, but the regular pages do. Is there something obvious I am missing here?
Technical SEO | | Lauren_Brick0 -
Best way to handle indexed pages you don't want indexed
We've had a lot of pages indexed by google which we didn't want indexed. They relate to a ajax category filter module that works ok for front end customers but under the bonnet google has been following all of the links. I've put a rule in the robots.txt file to stop google from following any dynamic pages (with a ?) and also any ajax pages but the pages are still indexed on google. At the moment there is over 5000 pages which have been indexed which I don't want on there and I'm worried is causing issues with my rankings. Would a redirect rule work or could someone offer any advice? https://www.google.co.uk/search?q=site:outdoormegastore.co.uk+inurl:default&num=100&hl=en&safe=off&prmd=imvnsl&filter=0&biw=1600&bih=809#hl=en&safe=off&sclient=psy-ab&q=site:outdoormegastore.co.uk+inurl%3Aajax&oq=site:outdoormegastore.co.uk+inurl%3Aajax&gs_l=serp.3...194108.194626.0.194891.4.4.0.0.0.0.100.305.3j1.4.0.les%3B..0.0...1c.1.SDhuslImrLY&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=ff301ef4d48490c5&biw=1920&bih=860
Technical SEO | | gavinhoman0 -
We have set up 301 redirects for pages from an old domain, but they aren't working and we are having duplicate content problems - Can you help?
We have several old domains. One is http://www.ccisound.com - Our "real" site is http://www.ccisolutions.com The 301 redirect from the old domain to the new domain works. However, the 301-redirects for interior pages, like: http://www.ccisolund.com/StoreFront/category/cd-duplicators do not work. This URL should redirect to http://www.ccisolutions.com/StoreFront/category/cd-duplicators but as you can see it does not. Our IT director supplied me with this code from the HT Access file in hopes that someone can help point us in the right direction and suggest how we might fix the problem: RewriteCond%{HTTP_HOST} ccisound.com$ [NC] RewriteRule^(.*)$ http://www.ccisolutions.com/$1 [R=301,L] Any ideas on why the 301 redirect isn't happening? Thanks all!
Technical SEO | | danatanseo0 -
According to 1 of my PRO campaigns - I have 250+ pages with Duplicate Content - Could my empty 'tag' pages be to blame?
Like I said, my one of my moz reports is showing 250+ pages with duplicate content. should I just delete the tag pages? Is that worth my time? how do I alert SEOmoz that the changes have been made, so that they show up in my next report?
Technical SEO | | TylerAbernethy0 -
/index.php/ page
I was wondering if my system creates this page www my domain com/index.php/ is it better to block with robot.txt or just canonize?
Technical SEO | | ciznerguy0 -
Getting more pages indexed by yahoo and bing
Anyone has a reliable way to get more pages indexed in yahoo and bing. Please dont say to get more inner page quality links.
Technical SEO | | mickey110 -
I have a site that has both http:// and https:// versions indexed, e.g. https://www.homepage.com/ and http://www.homepage.com/. How do I de-index the https// versions without losing the link juice that is going to the https://homepage.com/ pages?
I can't 301 https// to http:// since there are some form pages that need to be https:// The site has 20,000 + pages so individually 301ing each page would be a nightmare. Any suggestions would be greatly appreciated.
Technical SEO | | fthead90 -
2000 pages indexed in Yahoo, 0 in Google. NO PR, What is wrong?
Hello Everyone, I have a friend with a blog site that has over 2000 pages indexed in Yahoo but none in Google and no page rank. The web site is http://www.livingorganicnews.com/ I know it is not the best site but I am guessing something is wrong and I don't see it. Can you spot it? Does he have some settings wrong? What should he do? Thank you.
Technical SEO | | QuietProgress0