Old pages still in index
-
Hi Guys,
I've been working on a E-commerce site for a while now. Let me sum it up :
- February new site is launched
- Due to lack of resources we started 301's of old url's in March
- Added rel=canonical end of May because of huge index numbers (developers forgot!!)
- Added noindex and robots.txt on at least 1000 urls.
- Index numbers went down from 105.000 tot 55.000 for now, see screenshot (actual number in sitemap is 13.000)
Now when i do site:domain.com there are still old url's in the index while there is a 301 on the url since March!
I know this can take a while but I wonder how I can speed this up or am doing something wrong. Hope anyone can help because I simply don't know how the old url's can still be in the index.
-
Hi Dan,
Thanks for the answer!
Indexation is already back to 42.000 so slowly going back to normal
And thanks for the last tip, that's totally right. I just discovered that several pages had duplicate url's generated so by continually monitoring we'll fix it !
-
Hi There
To noindex pages there are a few methods;
-
use a meta noindex without robots.txt - I think that is why some may not be removed. The robots.txt block crawling so they can not see the noindex.
-
use a 301 redirect - this will eventually kill off the old pages, but it can definitely take a while.
-
canonical it to another page. and as Chris says, don't block the page or add extra directives. If you canonical the page (correctly), I find it usually drops out of the index fairly quickly after being crawled.
-
use the URL removal tool in webmaster tools + robots.txt or 404. So if you 404 a page or block it with robots.txt you can then go into webmaster tools and do a URL removal. This is NOT recommended though in most normal cases, as Google prefers this be for "emergencies".
The only method that removes pages within a day or two guaranteed is the URL removal tool.
I would also examine your site since it is new, for something that is causing additional pages to be generated and indexed. I see this a lot with ecommerce sites where they have lots of pagination, facets, sorting, etc and those can generate lots of other pages which get indexed.
Again, as Chris says, you want to be careful to not mix signals. Hope this all helps!
-Dan
-
-
Hi Chris,
Thanks for your answer.
I'm either using a 301 or noindex, not both of course.
Still have to check the server logs, thanks for that!
Another weird thing. While the old url is still in the index, when i check the cache date it's a week old. That's what i don't get. Cache date is a week old but Google still has the old url in the index.
-
It can take months for pages to fall out of Google's index have you looked at your log files to verify that googlebot is crawling those pages?. Things to keep in mind:
- If you 301 a page, the rel=canonical on that page will not be seen by the bot (no biggie in your case)
- If you 301 a page, a meta noindex will not be seen by the bot
- It is suggested not to use the robots.txt to no index a page that is being 301 redirected--as the redirect may not be seen by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Two high ranking pages instantly dropped from index - no manual penalty notification
We are facing an issue where two of our major rankings pages have just completely disappeared from search results. This has happened in the last 24 - 48 hours and there has been no changes made to the site. From what we can tell, it's only impacted two pages (but two very important category pages). I have double and triple checked all standard indexing protocols - Search Console URL inspection says the pages are fine to crawl and index. URLs have been requested to re-index but nothing has worked. This would have me to believe it could be a manual action yet there are no notifications in Search Console and we are listed as 'No issues detected' in all versions of our web property. Can anyone else think what could be the reason?
Intermediate & Advanced SEO | | Vuly0 -
Why does my old brand name still show up on organic search but as my new brand name and domain?
Hello mozers! I have quite the conundrum. My client used to have the unfortunate brand name "Meetoo" - which by the way they had before the movement happened! So naturally, they rebranded to the name Vevox in March 2019 to avoid confusion to users. However, when you search for their old brand name "Meetoo" the first organic link that pops up is their domain www.vevox.com. Now, this wouldn't normally be a problem, however it is when any #MeToo news appears in the media and we get a sudden influx or wrong traffic. I've searched the HTML and content for the term "Meetoo" but can only find one trace of this name through a widget. Not enough to hold an organic spot. My only other thinking is that www.vevox.com is redirected from www.meetoo.com. So I'm assuming this is why Vevox appear under the search term "Meetoo". How can I remove the homepage www.vevox.com from appearing for the search term "meetoo"? Can anyone help? AvGGYBc
Intermediate & Advanced SEO | | Virginia-Girtz3 -
SEO: How to change page content + shift its original content to other page at the same time?
Hello, I want to replace the content of one page of our website (already indexeed) and shift its original content to another page. How can I do this without problems like penalizations etc? Current situation: Page A
Intermediate & Advanced SEO | | daimpa
URL: example.com/formula-1
Content: ContentPageA Desired situation: Page A
URL: example.com/formula-1
Content: NEW CONTENT! Page B
URL: example.com/formula-1-news
Content: ContentPageA (The content that was in Page A!) Content of the two pages will be about the same argument (& same keyword) but non-duplicate. The new content in page A is more optimized for search engines. How long will it take for the page to rank better?0 -
Pagination on a product page with reviews spread out on multiple pages
Our current product pages markup only have the canonical URL on the first page (each page loads more user reviews). Since we don't want to increase load times, we don't currently have a canonical view all product page. Do we need to mark up each subsequent page with its own canonical URL? My understanding was that canonical and rel next prev tags are independent of each other. So that if we mark up the middle pages with a paginated URL, e.g: Product page #1http://www.example.co.uk/Product.aspx?p=2692"/>http://www.example.co.uk/Product.aspx?p=2692&pageid=2" />**Product page #2 **http://www.example.co.uk/Product.aspx?p=2692&pageid=2"/>http://www.example.co.uk/Product.aspx?p=2692" />http://www.example.co.uk/Product.aspx?p=2692&pageid=3" />Would mean that each canonical page would suggest to google another piece of unique content, which this obviously isn't. Is the PREV NEXT able to "override" the canonical and explain to Googlebot that its part of a series? Wouldn't the canonical then be redundant?Thanks
Intermediate & Advanced SEO | | Don340 -
De-indexing product "quick view" pages
Hi there, The e-commerce website I am working on seems to index all of the "quick view" pages (which normally occur as iframes on the category page) as their own unique pages, creating thousands of duplicate pages / overly-dynamic URLs. Each indexed "quick view" page has the following URL structure: www.mydomain.com/catalog/includes/inc_productquickview.jsp?prodId=89514&catgId=cat140142&KeepThis=true&TB_iframe=true&height=475&width=700 where the only thing that changes is the product ID and category number. Would using "disallow" in Robots.txt be the best way to de-indexing all of these URLs? If so, could someone help me identify how to best structure this disallow statement? Would it be: Disallow: /catalog/includes/inc_productquickview.jsp?prodID=* Thanks for your help.
Intermediate & Advanced SEO | | FPD_NYC0 -
Are pages with a canonical tag indexed?
Hello here, here are my questions for you related to the canonical tag: 1. If I put online a new webpage with a canonical tag pointing to a different page, will this new page be indexed by Google and will I be able to find it in the index? 2. If instead I apply the canonical tag to a page already in the index, will this page be removed from the index? Thank you in advance for any insights! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Cleaning up /index.html on home page
All, What is the best way to deal with a home page that has the /index.html at the end of it? 301 redirect to the .com home page? Just want to make sure I'm not missing something. Thanks in advance.
Intermediate & Advanced SEO | | JSOC0 -
Why my own page is not indexed for that keyword?
hi, I recently recreated the page www.zenucchi.it /ITA/poltrona-frau-brescia.html on the third level domain poltronafraubrescia.zenucchi.it by putting it on the home page. The first page is still indexed for the keyword poltrona frau brescia . But the new page is no indexed for that keyword and i don't know why ( even if the page is indexed in google ) .. I state that the new domain has the same autorithy and that i put a 301 redirect to pass his authority to the new one that has many more incoming links that did not have previous .. i hope you'll help me thanks a lot
Intermediate & Advanced SEO | | guidoboem0