Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Disallowed Pages Still Showing Up in Google Index. What do we do?
-
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses.
We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed.
As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results.
Any ideas re: how we get Google to pay attention and re-index our site properly?
-
The last time I used a tool, excluding via robots.txt was also sufficient for URL removal.
Recently, Google has updated their documentation to strongly encourage you to use URL removal only for things like exposing confidential information, and not to clean up old pages or errors in your GWT account (see http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1269119). I know many people still use the tool for that type of stuff, but wanted to point out that change.
-
Thank you Keri.
Yes, good idea, but whatever you request, that page or directory must respond with a 404, otherwise, it will be ignored.
- that is why I couldn't do that with the send to a friend URLs
(would have been a nice thing to do)
I guess I could have cheated, and made them return a 404 if it was google, just to dump them all out of the index.
The 15,000 I did request to be removed were individual pages, that returned 404 response code, so thats why I did them one at a time. I could have waited, but if you wait, then google keeps trying to fetch those missing pages and they keep reporting them in your GWT.
That is a good reason to request the removals.
I actually gave up when the number of deletions got to 1.5 million. I figured it was just too hard to do.
-
The last time I looked, you can request removal of an entire directory as well, which should work for the OP.
-
I would have said the same thing, except that a few weeks ago, I removed a rule from the robots file and I changed the affected pages to have a noindex.nofollow and the next day, tens of thousands of those pages appeared in the index and overpowered the content pages.
So my advice, is don't trust noindex,nofollow and just stop the robot going down that tree (as you are doing) and find another way to get those pages out of the index.
You can use the URL removal request tool.
It only seems to allow you to remove 1000 per day.
I have done this before by automating the removal using a macro program.
I think I removed about 15,000 over the space of a month, doing that.
They are fairly fast at removing URLs these days, 24 hours or less.
-
Disallowing in your robots.txt keeps the bots from indexing your pages going forward, but Google may keep returning them in search results. This post has great explanations about ways to remove pages from indices: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
The surefire way to get them out of the index is to remove the disallow from your robots.txt, and add a meta noindex tags on all the pages you want removed. Once they're reindexed by Google, they'll no longer appear in SERPs.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Staging website got indexed by google
Our staging website got indexed by google and now MOZ is showing all inbound links from staging site, how should i remove those links and make it no index. Note- we already added Meta NOINDEX in head tag
Intermediate & Advanced SEO | | Asmi-Ta0 -
Home page suddenly dropped from index!!
A client's home page, which has always done very well, has just dropped out of Google's index overnight!
Intermediate & Advanced SEO | | Caro-O
Webmaster tools does not show any problem. The page doesn't even show up if we Google the company name. The Robot.txt contains: Default Flywheel robots file User-agent: * Disallow: /calendar/action:posterboard/
Disallow: /events/action~posterboard/ The only unusual thing I'm aware of is some A/B testing of the page done with 'Optimizely' - it redirects visitors to a test page, but it's not a 'real' redirect in that redirect checker tools still see the page as a 200. Also, other pages that are being tested this way are not having the same problem. Other recent activity over the last few weeks/months includes linking to the page from some of our blog posts using the page topic as anchor text. Any thoughts would be appreciated.
Caro0 -
How do you check the google cache for hashbang pages?
So we use http://webcache.googleusercontent.com/search?q=cache:x.com/#!/hashbangpage to check what googlebot has cached but when we try to use this method for hashbang pages, we get the x.com's cache... not x.com/#!/hashbangpage That actually makes sense because the hashbang is part of the homepage in that case so I get why the cache returns back the homepage. My question is - how can you actually look up the cache for hashbang page?
Intermediate & Advanced SEO | | navidash0 -
Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?
I'm doing a detailed analysis of how Google sees and indexes our website and we have found that there are 240,256 pages in the index which is way too many. It's an e-commerce site that needs some tidying up. I'm working with an SEO specialist to set up URL parameters and put information in to the robots.txt file so the excess pages aren't indexed (we shouldn't have any more than around 3,00 - 4,000 pages) but we're struggling to find a way to get a list of these 240,256 pages as it would be helpful information in deciding what to put in the robots.txt file and which URL's we should ask Google to remove. Is there a way to get a list of the URL's indexed? We can't find it in the Google Webmaster Tools.
Intermediate & Advanced SEO | | sparrowdog0 -
Our login pages are being indexed by Google - How do you remove them?
Each of our login pages show up under different subdomains of our website. Currently these are accessible by Google which is a huge competitive advantage for our competitors looking for our client list. We've done a few things to try to rectify the problem: - No index/archive to each login page Robot.txt to all subdomains to block search engines gone into webmaster tools and added the subdomain of one of our bigger clients then requested to remove it from Google (This would be great to do for every subdomain but we have a LOT of clients and it would require tons of backend work to make this happen.) Other than the last option, is there something we can do that will remove subdomains from being viewed from search engines? We know the robots.txt are working since the message on search results say: "A description for this result is not available because of this site's robots.txt – learn more." But we'd like the whole link to disappear.. Any suggestions?
Intermediate & Advanced SEO | | desmond.liang1 -
Are pages with a canonical tag indexed?
Hello here, here are my questions for you related to the canonical tag: 1. If I put online a new webpage with a canonical tag pointing to a different page, will this new page be indexed by Google and will I be able to find it in the index? 2. If instead I apply the canonical tag to a page already in the index, will this page be removed from the index? Thank you in advance for any insights! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Why does my home page show up in search results instead of my target page for a specific keyword?
I am using Wordpress and am targeting a specific keyword..and am using Yoast SEO if that question comes up.. and I am at 100% as far as what they recommend for on page optimization. The target html page is a "POST" and not a "Page" using Wordpress definitions. Also, I am using this Pinterest style theme here http://pinclone.net/demo/ - which makes the post a sort of "pop-up" - but I started with a different theme and the results below were always the case..so I don't know if that is a factor or not. (I promise .. this is not a clever spammy attempt to promote their theme - in fact parts of it don't even work for me yet so I would not recommend it just yet...) I DO show up on the first page for my keyword.. however.. instead of Google showing the page www.mywebsite.com/this-is-my-targeted-keyword-page.htm Google shows www.mywebsite.com in the results instead. The problem being - if the traffic goes only to my home page.. they will be less likely to stay if they dont find what they want immediately and have to search for it.. Any suggestions would be appreciated!
Intermediate & Advanced SEO | | chunkyvittles0 -
Should pages of old news articles be indexed?
My website published about 3 news articles a day and is set up so that old news articles can be accessed through a "back" button with articles going to page 2 then page 3 then page 4, etc... as new articles push them down. The pages include a link to the article and a short snippet. I was thinking I would want Google to index the first 3 pages of articles, but after that the pages are not worthwhile. Could these pages harm me and should they be noindexed and/or added as a canonical URL to the main news page - or is leaving them as is fine because they are so deep into the site that Google won't see them, but I also won't be penalized for having week content? Thanks for the help!
Intermediate & Advanced SEO | | theLotter0