Old pages STILL indexed...
-
Our new website has been live for around 3 months and the URL structure has completely changed. We weren't able to dynamically create 301 redirects for over 5,000 of our products because of how different the URL's were so we've been redirecting them as and when.
3 months on and we're still getting hundreds of 404 errors daily in our Webmaster Tools account. I've checked the server logs and it looks like Bing Bot still seems to want to crawl our old /product/ URL's. Also, if I perform a "site:example.co.uk/product" on Google or Bing - lots of results are still returned, indicating the both still haven't dropped them from their index.
Should I ignore the 404 errors and continue to wait for them to drop off or should I just block /product/ in my robots.txt? After 3 months I'd have thought they'd have naturally dropped off by now!
I'm half-debating this:
User-agent: *
Disallow: /some-directory-for-all/*User-agent: Bingbot
User-agent: MSNBot
Disallow: /product/Sitemap: http://www.example.co.uk/sitemap.xml
-
Yea. If you cannot do it dynamically, it gets to be a real PIA, and also, depending on how you setup the 301s, you may get an overstuffed .htaccess file that could cause problems.
If these pages were so young and did not have any link equity or rank to start with, they are probably not worth 301ing.
One tool you may want to consider is URLprofiler http://urlprofiler.com/ You could take all the old URLs and have URL profiler pull in GA data (from when they were live on your site) and then also pull in OSE data from Moz. You can then filter them and see what pages got traffic and links. Take those select "top pages" and make sure they 301 to the correct page on the new URL structure and then go from there. URL profiler has a free 15 day trial that you could use for this project and get done at no charge. But after using the product, you will see it is pretty handy and may buy anyway.
Ideally, if you could have dynamically 301ed the old pages to the new, that would have been the simplest method, but with your situation, I think you are ok. Google is just trying to help to make sure you did not "mess up" and 404 those old pages on accident. It wants to give you the benefit of the doubt. It is crazy sometimes how they keep things in the index.
I am monitoring a site that scraped one of my sites. They shut the entire site down after we threatened legal action. The site has been down for weeks and showing 404s, but I can still do a site: search and see them in the index. Meh.
-
Forgot to add this - just some free advice. You have your CSS inlined in your HTML. Ideally, you want to have that in an external CSS file. That way, once the user loads that external file, they do not have to download it multiple times so the experience is faster on subsequent pages.
If you were testing your page with Google site speed and they mentioned render blocking CSS issues and that is why you inlined your CSS, the solution is not to inline all your CSS, but to just inline what is above the fold and put the rest in an external file.
Hope that makes sense.
-
I suppose that's the problem. We've spent hours redirecting hundreds of 404 pages to new/relevant locations - but these pages don't receive organic traffic. It's mostly just BingBot, MSNBot and GoogleBot crawling them because they're still indexed.
I think I'm going to leave them as 404 rather than trying to keep on top of 301 redirecting them and I'll leave it in Google's hands to eventually drop them off!
Thanks!
Liam
-
General rule of thumb, if a page 404s and it is supposed to 404 dont worry about it. The Search Console 404 report does not mean that you are being penalized although it can be diagnostic. If you block the 404 pages in robots.txt yea, it will take the 404 errors out of the Search Console report, but then Google never "deals" with those 404s. It can take 3 months (maybe longer) to get things out of Search Console, I have noticed it taking longer here lately, but what you need to do first is ask the following questions
-
Do I still link internally to any of these /product/ URLs? If you do, Google may assume that you are 404ing those pages by mistake and leave them in the report longer as if you are still linking internally to them they must be a viable page.
-
Do any of these old URLs have value? Do they have links to them from external sites? Did they used to rank for a KW? You should probably 301 them to a semantically relevant page then vs 404ing and getting some use out of them.
If you have either of the above, Google may continue to remind you of the 404 as it thinks the page might be valuable and want to "help" you out.
You mention 5,000 URLs that were indexed and then you 404 them. You cannot assume that Search Console works in real time or that Google checks all 5,000 of these URLs at the same time. Google has a given crawl budget for your site on how often it will crawl a given page. Some pages they crawl more often (home page) some pages they crawl less often. They then have to process those crawls once they get the data back. What you will see in a situation like this is that if you 404 several thousand pages, you will first see several hundred show up in your Search Console report, then the next day some more, then some more, etc. Over time, the total will build and then may peak and then gradually start to fall off. Google has to find the 404s, process them and then show them in the report. You may see 500 of your 404 pages today, but then 3 months later, there may be 500 other 404 pages that show up in the report and those original 500 are now gone. This is why you might be seeing 404 errors after 3 months in addition to the examples I gave above.
It would be great if the process were faster and the data was cleaner. The report has a checkbox for "this is fixed" and that is great if you fixed something, but they need a checkbox for "this is supposed to 404" to help clear things out. If I have learned anything about Search Console, it is helpful, but the data in many cases is not real time.
Good luck!
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
One Page Design / Single Product Page
I have been working in a project. Create a framework for multi pages that I have So here is the case
Intermediate & Advanced SEO | | Roman-Delcarmen
Most of them are single page product / one page design wich means that I dont have many pages to optimize. All this sites/ pages follow the rules of a landing page optimization because my main goals is convert as many users as I can. At this point I need to optimize the SEO, the basic stuff such as header, descriptions, tittles ect. But most of my traffic is generated by affiliates, which is good beacuse I dont have to worrie to generate traffic but if the affiliate network banned my product, then I lose all my traffic. Put all my eggs in the same basket is not a good idea. Im not an seo guru so that is the reason Im asking whic strategies and tactics can give me results. All kind of ideas are welcome1 -
How long to re-index a page after being blocked
Morning all! I am doing some research at the moment and am trying to find out, just roughly, how long you have ever had to wait to have a page re-indexed by Google. For this purpose, say you had blocked a page via meta noindex or disallowed access by robots.txt, and then opened it back up. No right or wrong answers, just after a few numbers 🙂 Cheers, -Andy
Intermediate & Advanced SEO | | Andy.Drinkwater0 -
980 links from 75 domains and Graded "A" on Moz Page Grader-- still not ranking for our term. Thoughts?
A few additional interesting details: A blog post we wrote with the same keyword ranks 8, but this page does not crack the top 20. Crazy competitive term-- top SERP are from HBR, Entrepreneur and Inc. We use Instapage as landing page builder-- could this effect our rankings? URL is not a subdomain Pretty stumped over here. Thanks y'all!
Intermediate & Advanced SEO | | lbernes220 -
Redirect old "not found" url (at http) to new corresponding page (now at https)
My least favorite part of SEO 😉 I'm trying to redirect an old url that no longer exists to our new website that is built with https. The old url: http://www.thinworks.com/palm-beach-gardens-team/ New url: https://www.thinworks.com/palm-beach-gardens/ This isn't working with my standard process of the quick redirection plugin in WP or through htaccess because the old site url is at http and not https. Any help would be much appreciated! How do I accomplish this, where do I do it and what's the code I'd use? Thank you Moz community! Ricky
Intermediate & Advanced SEO | | SUCCESSagency0 -
Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?
I'm doing a detailed analysis of how Google sees and indexes our website and we have found that there are 240,256 pages in the index which is way too many. It's an e-commerce site that needs some tidying up. I'm working with an SEO specialist to set up URL parameters and put information in to the robots.txt file so the excess pages aren't indexed (we shouldn't have any more than around 3,00 - 4,000 pages) but we're struggling to find a way to get a list of these 240,256 pages as it would be helpful information in deciding what to put in the robots.txt file and which URL's we should ask Google to remove. Is there a way to get a list of the URL's indexed? We can't find it in the Google Webmaster Tools.
Intermediate & Advanced SEO | | sparrowdog0 -
Should We Add the W3.org Language Tag To Every Page Or Just The Home Page?
Greetings, We have five international sites around the world, two of which are in difference languages. Currently we have the following line of html code on the home page of each of the sites: Clearly, we need to change the "en" portion for the sites that aren't in English, but, should we include that meta tag in each of the site's pages, or will the home page suffice. Thanks!
Intermediate & Advanced SEO | | CSawatzky0 -
Alternative HTML Structure for indexation of JavaScript Single Page Content
Hi there, we are currently setting up a pure html version for Bots on our site amazine.com so the content as well as navigation will be fully indexed by google. We will show google exactly the same content the user sees (except for the fancy JS effects). So all bots get pure html and real users see the JS based version. My questions are first, if everyone agrees that this is the way to go or if there are alternatives to this to get the content indexed. Are there best practices? All JS-based websites must have this problem, so I am hoping someone can share their experience. The second question regards the optimal number of content pieces ('Stories') displayed per page and the best method to paginate. Should we display e.g. 10 stories and use ?offset in the URL or display 100 stories to google per page and maybe use rel=”next”/"pref" instead. Generally, I would really appreciate any pointers and experiences from you guys as we haven't done this sort of thing before! Cheers, Frank
Intermediate & Advanced SEO | | FranktheTank-474970 -
Google replacing subpages in index with home page?
Hi! I run a backlink building company. Recently, we had a customer who had us build targeted backlinks to certain subpages on his site. Then something really bizarre happened...all of a sudden, their subpages that were indexed in Google (the ones we were building links to) disappeared from the index, to be replaced with their home page. They haven't lost their rank, per se--it's just now their home page instead of their subpages. At this point, we are tracking literally thousands of keywords for our link building customers, and we've never run into this issue before. Have you ever run into it? If so, what's the best way to handle it from an SEO company perspective? They have a sitemap.xml and their GWT account reports no crawl errors, so it doesn't seem to be a site issue.
Intermediate & Advanced SEO | | ownlocal0