Pages getting into Google Index, blocked by Robots.txt??
-
Hi all,
So yesterday we set up to Remove URL's that got into the Google index that were not supposed to be there, due to faceted navigation... We searched for the URL's by using this in Google Search.
site:www.sekretza.com inurl:price=
site:www.sekretza.com inurl:artists=So it brings up a list of "duplicate" pages, and they have the usual: "A description for this result is not available because of this site's robots.txt – learn more."
So we removed them all, and google removed them all, every single one.
This morning I do a check, and I find that more are creeping in - If i take one of the suspecting dupes to the Robots.txt tester, Google tells me it's Blocked. - and yet it's appearing in their index??
I'm confused as to why a path that is blocked is able to get into the index?? I'm thinking of lifting the Robots block so that Google can see that these pages also have a Meta NOINDEX,FOLLOW tag on - but surely that will waste my crawl budget on unnecessary pages?
Any ideas?
thanks.
-
Oh, ok. If that's the case, pls don't worry about those in the index. You can get them removed using remove URL feature in webmaster tools account.
-
It doesn't show any result for the "blocked page" when I do that in Google.
-
Hi,
Please try this and let us know the results:
Suppose this is one of the pages in discussion:
http://www.yourdomain.com/blocked-page.html
Go to Google, type the following along with double quotes. Replace with the actual page:
"yourdomain.com/blocked-page.html" -site:yourdomain.com
-
Hi!
From what I could tell, it wasn't that many pages already in the index, so it could be worth trying to lift the block, at least for a short while, to see if it will have an impact.
In addition - how about configuring how GoogleBot should threat your URLs via the URL parameter tool in Google Webmaster Tools. Here's what Google has to say about this. https://support.google.com/webmasters/answer/1235687
Best regards,Anders
-
Hi Devanur.
What I'm guessing is the problem here, is that as of now, GoogleBot is restricted from accessing the pages (because of robots.txt), leading to it never going into the page and updateing its index regarding the "noindex, follow" declaration in the that seems to be in place.
One other thing that could be considered, is to add "rel=nofollow" to all the faceted navigation links on the left.
Fully agreeing with you on the "crawl budget" part
Anders
-
Hi guys,
Appreciate your replies, but as far as I checked last time, if the URL is blocked by a Robots.txt file, it cannot read the Meta Noindex, Follow tag within the page.
There are no external references to these URL's, so Google is finding them within the site itself.
In essence, what you are recommending is that I lift the robots block and let google crawl these pages (which could be infinite as it is faceted navigation).
This will waste my crawl budget.
Any other ideas?
-
Anderss has pointed out to the right article. With robots.txt blocking, Google bot will not do the crawl (link discovery) from within the website but what if references to these blocked pages are found else where on third-party websites? This is the case you have been into. So to fully block Google from doing the link discovery and indexing these blocked pages, you should go in for the page-level meta robots tag to block these pages. Once this is in place, this issue will fade away.
This issue has been addressed many times here on Moz.
Coming to your concern about the crawl budget. There is nothing to worry about this as Google will not crawl those blocked pages while its on your website as these are already been blocked using robots.txt file.
Hope it helps my friend.
Best regards,
Devanur Rafi
-
Hi!
It could be that that pages has already been indexed before you added the directives to robots.txt.
I see that you have added the rel=canonical for the pages and that you now have noindex,follow. Is that recently added? If so, it could be wise to actually let GoogleBot access and crawl the pages again - and then they'll go away after a while. Then you could add the directive again later. See https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466 for more about this.
Hope this helps!
Anders -
For example:
http://www.sekretza.com/eng/best-sellers-sekretza-products.html?price=1%2C1000Is blocked by using:
Disallow: /*price=.... ?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
22 Pages 7 Indexed
So I submitted my sitemap to Google twice this week the first time everything was just peachy, but when I went back to do it again Google only indexed 7 out of 22. The website is www.theinboundspot.com. My MOZ Campaign shows no issues and Google Webmaster shows none. Should I just resubmit it?
Intermediate & Advanced SEO | | theinboundspot1 -
Cache and index page of Mobile site
Hi, I want to check cache and index page of mobile site. I am checking it on mobile phone but it is showing the cache version of desktop. So anybody can tell me the way(tool, online tool etc.) to check mobile site index and cache page.
Intermediate & Advanced SEO | | vivekrathore0 -
Drop in Indexed pages
Hope everyone is having an Awesome December! I first noticed a drop in my index in the beginnings of November. My site drop in indexed pages from 1400 to 600 in the past 3-4 weeks. I don't know the cause of it, and would like the community to help me figure out why my indexing has dropped. Thank you for taking time out of your schedule to read this.
Intermediate & Advanced SEO | | BSC0 -
Robots.txt - blocking JavaScript and CSS, best practice for Magento
Hi Mozzers, I'm looking for some feedback regarding best practices for setting up Robots.txt file in Magento. I'm concerned we are blocking bots from crawling essential information for page rank. My main concern comes with blocking JavaScript and CSS, are you supposed to block JavaScript and CSS or not? You can view our robots.txt file here Thanks, Blake
Intermediate & Advanced SEO | | LeapOfBelief0 -
Google stripping down Page Titles
When viewing pages indexed by Google, I've noticed the Page Titles have be down stripped as follows: Actual Page Title: CITY Keyword - STATE keyword
Intermediate & Advanced SEO | | SpadaMan
Google Indexed Page Title: 1 - Domain.com None of the keywords in the actual PAGE TITLE are present; all words have been replaced with a random digit -domain.com. We launched a new version of the site several months back.. Any Idea on what can be causing this?0 -
Google+ Pages on Google SERP
Do you think that a Google+ Page (not profile) could appear on the Google SERP as a Rich Snippet Author? Thanks
Intermediate & Advanced SEO | | overalia0 -
Sudden Change In Indexed Pages
Every week I check the number of pages indexed by google using the "site:" function. I have set up a permanent redirect from all the non-www pages to www pages. When I used to run the function for the: non-www pages (i.e site:mysite.com), would have 12K results www pages (i.e site:www.mysite.com) would have about 36K The past few days, this has reversed! I get 12K for www pages, and 36K for non-www pages. Things I have changed: I have added canonical URL links in the header, all have www in the URL. My questions: Is this cause for concern? Can anyone explain this to me?
Intermediate & Advanced SEO | | inhouseseo0 -
NOINDEX listing pages: Page 2, Page 3... etc?
Would it be beneficial to NOINDEX category listing pages except for the first page. For example on this site: http://flyawaysimulation.com/downloads/101/fsx-missions/ Has lots of pages such as Page 2, Page 3, Page 4... etc: http://www.google.com/search?q=site%3Aflyawaysimulation.com+fsx+missions Would there be any SEO benefit of NOINDEX on these pages? Of course, FOLLOW is default, so links would still be followed and juice applied. Your thoughts and suggestions are much appreciated.
Intermediate & Advanced SEO | | Peter2640