Old pages still in index
-
Hi Guys,
I've been working on a E-commerce site for a while now. Let me sum it up :
- February new site is launched
- Due to lack of resources we started 301's of old url's in March
- Added rel=canonical end of May because of huge index numbers (developers forgot!!)
- Added noindex and robots.txt on at least 1000 urls.
- Index numbers went down from 105.000 tot 55.000 for now, see screenshot (actual number in sitemap is 13.000)
Now when i do site:domain.com there are still old url's in the index while there is a 301 on the url since March!
I know this can take a while but I wonder how I can speed this up or am doing something wrong. Hope anyone can help because I simply don't know how the old url's can still be in the index.
-
Hi Dan,
Thanks for the answer!
Indexation is already back to 42.000 so slowly going back to normal
And thanks for the last tip, that's totally right. I just discovered that several pages had duplicate url's generated so by continually monitoring we'll fix it !
-
Hi There
To noindex pages there are a few methods;
-
use a meta noindex without robots.txt - I think that is why some may not be removed. The robots.txt block crawling so they can not see the noindex.
-
use a 301 redirect - this will eventually kill off the old pages, but it can definitely take a while.
-
canonical it to another page. and as Chris says, don't block the page or add extra directives. If you canonical the page (correctly), I find it usually drops out of the index fairly quickly after being crawled.
-
use the URL removal tool in webmaster tools + robots.txt or 404. So if you 404 a page or block it with robots.txt you can then go into webmaster tools and do a URL removal. This is NOT recommended though in most normal cases, as Google prefers this be for "emergencies".
The only method that removes pages within a day or two guaranteed is the URL removal tool.
I would also examine your site since it is new, for something that is causing additional pages to be generated and indexed. I see this a lot with ecommerce sites where they have lots of pagination, facets, sorting, etc and those can generate lots of other pages which get indexed.
Again, as Chris says, you want to be careful to not mix signals. Hope this all helps!
-Dan
-
-
Hi Chris,
Thanks for your answer.
I'm either using a 301 or noindex, not both of course.
Still have to check the server logs, thanks for that!
Another weird thing. While the old url is still in the index, when i check the cache date it's a week old. That's what i don't get. Cache date is a week old but Google still has the old url in the index.
-
It can take months for pages to fall out of Google's index have you looked at your log files to verify that googlebot is crawling those pages?. Things to keep in mind:
- If you 301 a page, the rel=canonical on that page will not be seen by the bot (no biggie in your case)
- If you 301 a page, a meta noindex will not be seen by the bot
- It is suggested not to use the robots.txt to no index a page that is being 301 redirected--as the redirect may not be seen by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 Externally Linked, But Non-Producing Pages, To Productive Pages Needing Links?
I'm working on a site that has some non-productive pages without much of an upside potential, but that are linked-to externally. The site also has some productive pages, light in external links, in a somewhat related topic. What do you think of 301ing the non-productive pages with links to the productive pages without links in order to give them more external link love? Would it make much of a difference? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
HTTPS pages - To meta no-index or not to meta no-index?
I am working on a client's site at the moment and I noticed that both HTTP and HTTPS versions of certain pages are indexed by Google and both show in the SERPS when you search for the content of these pages. I just wanted to get various opinions on whether HTTPS pages should have a meta no-index tag through an htaccess rule or whether they should be left as is.
Intermediate & Advanced SEO | | Jamie.Stevens0 -
Consistent Ranking Jumps Page 1 to Page 5 for months - help needed
Hi guys and gals, I have a really tricky client who I just can't seem to gain consistency with in their SERP results. The keywords are competitive but what the main issue I have is the big page jumps that happen pretty much on a weekly basis. We go up and down 40 positions and this behaviour has been going on for nearly 6 months.
Intermediate & Advanced SEO | | Jon_bangonline
I felt it would resolve itself in time but it has not. The website is a large ecommerce website. Their link profile is OK in regards to several high quality newspaper publication links, majority brand related anchor texts and the link building we have engaged in has all been very good i.e. content relevant / high quality places. See below for some potential causes I think could be the reason: The on page SEO is good however the way their ecommerce website is setup they have formed a substantial amount of duplicate title tags. So in my opinion this is a potential cause. The previous web developer set-up 301 redirects all to their home page for any 404 errors. I know best practice is to go to the most relevant pages, however could this be a potential issue? We had some server connectivity issues show up in webmasters tools but that was for 1 day about 4 months ago. Since then no issues. they have quite a few 'blocked URLs' in their robots.txt file, e.g. Disallow: /login, Disallow: /checkout/ but to me these seem normal and not a big issue. We have seen a decrease over the last 12 months in Webmasters Tools of 'total indexed web pages' from 5000 to 2000 which is quite an odd statistic. Summary So all in all I am a tad stumped. We have some duplicate content issues in title tags, perhaps not following best practice in the 301 redirects but other than that I dont see any major on page issues, unless I am missing something in the seriousness of what I have listed.
Finally we have also do a bit of a cull of poor quality links, requesting links to be removed and also submitting a 'disavow' of some really bad links. We do not have a manual penalty though. Thoughts, feedback or comments VERY welcome.0 -
Indexed non existent pages, problem appeared after we 301d the url/index to the url.
I recently read that if a site has 2 pages that are live such as: http://www.url.com/index and http://www.url.com/ will come up as duplicate if they are both live... I read that it's best to 301 redirect the http://www.url.com/index and http://www.url.com/. I read that this helps avoid duplicate content and keep all the link juice on one page. We did the 301 for one of our clients and we got about 20,000 errors that did not exist. The errors are of pages that are indexed but do not exist on the server. We are assuming that these indexed (nonexistent) pages are somehow linked to the http://www.url.com/index The links are showing 200 OK. We took off the 301 redirect from the http://www.url.com/index page however now we still have 2 exaact pages, www.url.com/index and http://www.url.com/. What is the best way to solve this issue?
Intermediate & Advanced SEO | | Bryan_Loconto0 -
Wrong Page Indexing in SERPS - Suggestions?
Hey Moz'ers! I have a quick question. Our company (Savvy Panda) is working on ranking for the keyword: "Milwaukee SEO". On our website, we have a page for "Milwaukee SEO" in our services section that's optimized for the keyword and we've been doing link building to this. However, when you search for "Milwaukee SEO" a different page is being displayed in the SERP's. The page that's showing up in the SERP's is a category view of our blog of articles with the tag "Milwaukee SEO". **Is there a way to alert google that the page showing up in the SERP's is not the most relevant and request a new URL to be indexed for that spot? ** I saw a webinar awhile back that showed something like that using google webmaster sitelinks denote tool. I would hate to denote that URL and then loose any kind of indexing for the keyword.
Intermediate & Advanced SEO | | SavvyPanda
Ideas, suggestions?0 -
Not sure why Home page is outranked by less optimized internal pages.
We launched our website just three weeks ago, and one of our primary keyword phrases is "e-business consultants". Here's what I don't get. Our home page is the page most optimized around this search phrase. Using SEOmoz On-Page Optimization tool, the home page scores an "A". And yet it doesn't rank in the top 50 on Google Canada, although two other INTERNAL pages - www.ebusinessconsultants.ca/about/consulting-team/ & /www.ebusinessconsultants.ca/about/consulting-approach/ - rank 5 & 6 on Google Canada, even though they only score a grade "C" for on-page optimization for this keyword phrase. I've always understood that the home page is the most powerful page. Why are these others outranking it? I checked the crawl and Google Webmaster, and there is no obvious problem on the home page. Is this because the site is so new? It goes against all previous experience I've had in similar situation. Any guidance/ insight would be highly appreciated!!
Intermediate & Advanced SEO | | axelk0 -
Old Redirecting Website Still Showing In SERPs
I have a client, a plumber, who bought another plumbing company (and that company's domain) at one point. This other company was very old and has a lot of name recognition so they created a dedicated page to this other company within their main website, and redirected the other company's old domain to that page. This has worked fine, in that this page on the main site is now #1 when you search for the other old company's name. But for some reason the old domain comes up #2 (despite the fact that it's redirecting). Now, I could understand if the redirect had only been set up recently, but I'm reasonably sure this happened about a year ago. Could it be due to the fact that there are many sites out there still linking to that old domain? Thanks in advance!
Intermediate & Advanced SEO | | VTDesignWorks1 -
Should I prevent Google from indexing blog tag and category pages?
I am working on a website that has a regularly updated Wordpress blog and am unsure whether or not the category and tag pages should be indexable. The blog posts are often outranked by the tag and category pages and they are ultimately leaving me with a duplicate content issue. With this in mind, I assumed that the best thing to do would be to remove the tag and category pages from the index, but after speaking to someone else about the issue, I am no longer sure. I have tried researching online, but there isn't anything that provided any further information. Please can anyone with any experience of dealing with issues like this or with any knowledge of the topic help me to resolve this annoying issue. Any input will be greatly appreciated. Thanks Paul
Intermediate & Advanced SEO | | PaulRogers0