Disallowed Pages Still Showing Up in Google Index. What do we do?
-
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses.
We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed.
As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results.
Any ideas re: how we get Google to pay attention and re-index our site properly?
-
The last time I used a tool, excluding via robots.txt was also sufficient for URL removal.
Recently, Google has updated their documentation to strongly encourage you to use URL removal only for things like exposing confidential information, and not to clean up old pages or errors in your GWT account (see http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1269119). I know many people still use the tool for that type of stuff, but wanted to point out that change.
-
Thank you Keri.
Yes, good idea, but whatever you request, that page or directory must respond with a 404, otherwise, it will be ignored.
- that is why I couldn't do that with the send to a friend URLs
(would have been a nice thing to do)
I guess I could have cheated, and made them return a 404 if it was google, just to dump them all out of the index.
The 15,000 I did request to be removed were individual pages, that returned 404 response code, so thats why I did them one at a time. I could have waited, but if you wait, then google keeps trying to fetch those missing pages and they keep reporting them in your GWT.
That is a good reason to request the removals.
I actually gave up when the number of deletions got to 1.5 million. I figured it was just too hard to do.
-
The last time I looked, you can request removal of an entire directory as well, which should work for the OP.
-
I would have said the same thing, except that a few weeks ago, I removed a rule from the robots file and I changed the affected pages to have a noindex.nofollow and the next day, tens of thousands of those pages appeared in the index and overpowered the content pages.
So my advice, is don't trust noindex,nofollow and just stop the robot going down that tree (as you are doing) and find another way to get those pages out of the index.
You can use the URL removal request tool.
It only seems to allow you to remove 1000 per day.
I have done this before by automating the removal using a macro program.
I think I removed about 15,000 over the space of a month, doing that.
They are fairly fast at removing URLs these days, 24 hours or less.
-
Disallowing in your robots.txt keeps the bots from indexing your pages going forward, but Google may keep returning them in search results. This post has great explanations about ways to remove pages from indices: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
The surefire way to get them out of the index is to remove the disallow from your robots.txt, and add a meta noindex tags on all the pages you want removed. Once they're reindexed by Google, they'll no longer appear in SERPs.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemap Indexed Pages, Google Glitch or Problem With Site?
Hello, I have a quick question about our Sitemap Web Pages Indexed status in Google Search Console. Because of the drastic drop I can't tell if this is a glitch or a serious issue. When you look at the attached image you can see that under Sitemaps Web Pages Indexed has dropped suddenly on 3/12/17 from 6029 to 540. Our Index status shows 7K+ indexed. Other than product updates/additions and homepage layout updates there have been no significant changes to this website. If it helps we are operating on the Volusion platform. Thanks for your help! -Ryan rou1zMs
Intermediate & Advanced SEO | | rrhansen0 -
Google does not index image sitemap
Hi, we put an image sitemap in the searchconsole/webmastertools http://www.sillasdepaseo.es/sillasdepaseo/sitemap-images.xml it contains only the indexed products and all images on the pages. We also claimed the CDN in the searchconsole http://media.sillasdepaseo.es/ It has been 2 weeks now, Google indexes the pages, but not the images. What can we do? Thanks in advance. Dieter Lang
Intermediate & Advanced SEO | | Storesco0 -
PR Dilution and Number of Pages Indexed
Hi Mozzers, My client is really pushing for me to get thousands, if not millions of pages indexed through the use of long-tail keywords. I know that I can probably get quite a few of them into Google, but will this dilute the PR on my site? These pages would be worthwhile in that if anyone actually visits them, there is a solid chance they will convert to a lead do to the nature of the long-tail keywords. My suggestion is to run all the keywords for these thousands of pages through adwords to check the number of queries and only create pages for the ones which actually receive searches. What do you guys think? I know that the content needs to have value and can't be scraped/low-quality and pulling these pages out of my butt won't end well, but I need solid evidence to make a case either for or against it to my clients.
Intermediate & Advanced SEO | | Travis-W0 -
Did Reviews still have the same value in Google places ranking?
I have two questions relating to Reviews. 1. Reviews still add value to Google places ranking. 2. I have a page and two clients posted reviews for me.They all get removed after 3,4 days.What is wrong with Google?Did they consider them fake?
Intermediate & Advanced SEO | | csfarnsworth0 -
404 with a Javascript Redirect to the index page...
I have a client that is wanting me to issue a 404 on her links that are no longer valid to a custom 404, pause for 10 seconds, then rediirect to the root page (or whatever other redirect logic she wants)...to me it seems trying to game googlebot this way is a "bad idea" Can anyone confirm/deny or offer up a better suggestion?
Intermediate & Advanced SEO | | JusinDuff0 -
Is there a negative effect to show categories and products on the same page?
I mean having say 5 different categories on a page and showing the products that are in those categories below the categories. Just In case people don't want to dig deeper to find there product because they know what they need already. I would also want those categories for the people that need to do a little more searching and have a better reference guide. So is there any negatives to my SEO doing it that way?
Intermediate & Advanced SEO | | Mike.Bean0 -
Why does google not show my ecommerce category page when I have the same keywords for many products in the product title?
I have found that google removes the google serach listing of a category from my site (ecommerce) when products within the category have the same key words. I sell golf shirts and have a category called "Mens Golf Shirts" Within the category I have added many products but when the too many of the products say mens golf shirt my link on google gets removed. Before i had products named: FUNKTION Mens Short Sleeve Golf Shirt Red / Black but now I have had to change it to: FUNKTION Red / Black I can understand that they may see this a keyword stuffing but how do I get around this to ensure that each product can rank on google for mens golf shirt
Intermediate & Advanced SEO | | funktiongolf0 -
Title tag showing in Google that we are not setting
Hello, We've noticed that when we do a specific search (print screen attached), that the business name and/or a completely different title is getting indexed into the search engine that we are not setting. Below is an example from the source code of how we're setting the title, this matches the 2nd listing circled in the attached image. The indexed title tag reflects "Animal Business Card Holders - Kyle Design" Any ideas or feedback on how this is happening? <title>Animal Business Card Cases in Pet, Insect and Wildlife Designstitle> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="description" content="Eye-catching business card holder cases personalized with custom animal designs for humane professionals and pet owners. Custom select a sleek metal finish, bold aluminum or iridescent accent color, size and unique design for the ultimate self-expressing animal gift!" /> <meta name="keywords" content="business card holder unique personalized custom holders silver gold wood metal cards cases sleek aluminum engraved contemporary case animal animals design designs black color accents iridescent pet insect wildlife cat dog dragonfly butterfly lions sea turtles sea otters elephants animal lover animal activist zoologist veterinarian breeder animal whisperer thin deep large credit Asian size engraving personalize gift gifts special monogram customized corporate logo name professional title meaningful sentiment" /> <meta name="copyright" content="Copyright Kyle Design" /> <meta name="author" content="Kyle Design" />
Intermediate & Advanced SEO | | marketing_zoovy.com
<meta name="generator" content="xyz Commerce System http://www.domain.com/" />
<link rel="canonical" href="xyz link"
<script type="text/javaScript"> Thanks,
Jamie0