How to Stop Google from Indexing Old Pages
-
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact).
These pages no longer exist on the site and there are no internal or external links pointing to these pages.
Google has crawled the site since the go live, but continues to try and crawl these pages.
What are my next steps?
-
All my clients are impatient with Google's crawl. I think the speed of life on the web has spoiled them. Assuming your site isn't a huge e-commerce or subject-matter site...you will get crawled but not right away. Smaller, newer sites take time.
Take any concern and put it towards link building to the new site so Google's crawlers find it faster (via their seed list). Get it up on DMOZ, get that Twitter account going, post videos to Youtube, etc. Get some juicy high-PR inbound links and that could help speed up the indexing. Good luck!
-
Like Mike said above, there still isn't enough info provided for us to give you a very clear response, but I think he is right to point out that you shouldnt really care about the extinct pages in Google's index. They should, at some point, expire.
You can specify particular URLs to remove in GWT, or your robots.txt file, but that doesn't seem the best option for you. My recommendation is to just prepare the new site in the new location, upload a good clean sitemap.xml to GWT, and let them adjust. If you have much of the same content as well, Google will know due to the page creation date which is the newer and more appropriate site. Hate to say "trust the engines" but in this case, you should.
You may also consider a rel="author" tag in your new site to help Google prioritize the new site. But really the best thing is a new site on a new domain, a nice sitemap.xml, and patience.
-
To further clear things up...
I can 301 every page from the old .php site to our new homepage (However, I'm concerned about Google's impression of our overall user experience).
Or
I can 410 every page from the old .php site (Wouldn't this tell Google to stop trying to crawl these pages? Although these pages technically still exist, they just have a different URL and directory structure. Too many to set up individual 301's tho).
Or
I can do nothing and wait for these pages to drop off of Google's radar
What is the best option?
-
After reading the further responses here I'm wondering something...
You switched to a new site, can't 301 the old pages, and have no control over the old domain... So why are you worried about pages 404ing on an unused site you don't control anymore?
Maybe I'm missing something here or not reading it right. Who does control the old domain then? Is the old domain just completely gone? Because if so, why would it matter that Google is crawling non-existent pages on a dead site and returning 404s and 500s? Why would that necessarily affect the new site?
Or is it the same site but you switched to Java from PHP? If so, wouldn't your CMS have a way of redirecting the old pages that are technically still part of your site to the newer relevant pages on the site?
I feel like I'm missing pertinent info that might make this easier to digest and offer up help.
-
Sean,
Many thanks for your response. We have submitted a new, fresh site map to Google, but it seems like it's taking them forever to digest the changes.
We've been keeping track of rankings, and they've been going down, but there are so many changes going on at once with the new site, it's hard to tell what is the primary factor for the decline.
Is there a way to send Google all of the pages that don't exist and tell them to stop looking for them?
Thanks again for your help!
-
You would need access to the domain to set up the 301. If you no longer can edit files on the old domain, then your best bet is to update Webmaster Tools with the new site info and a sitemap.xml and wait for their caches to expire and update.
Somebody can correct me on this if I'm wrong, but getting so many 404s and 500's already has probably impacted your rankings so significantly, that you may be best served to approach the whole effort as a new site. Again, without more data, I'm left making educated guesses here. And if you aren't tracking your rankings (as you asked how much it is impacting...you should be able to see), then I would let go of the old site completely and build search traffic fresh on the new domain. You'd probably generate better results in the long term by jettisoning a defunct site with so many errors.
I confess, without being able to dig into the site analytics and traffic data, I can't give direct tactical advice. However, the above is what I would certainly do. Resubmitting a fresh sitemap.xml to GWT and deleting all the info to the old site in there is probably your best option. I defer to anyone with better advice. What a tough position you are in!
-
Thanks all for the feedback.
We no longer have access to the old domain. How do we institute a 301 if we can no longer access the page?
We have over 200,000 pages throwing 404's and over 70,000 pages throwing 500 errors.
This probably doesn't look good to Google. How much is this impacting our rankings?
-
Like others have said, a 301 redirect and updating Webmaster Tools should be most of what you need to do. You didn't say if you still have access to the old domain (where the pages are still being crawled) or if you get a 404, 503, or some other error when navigating to those pages. What are you seeing or can you provide a sample URL? That may help eliminate some possibilities.
-
You should implement 301 redirects from your old pages to their new locations. It's sounds like you have a fairly large site, which means Google has tons of your old pages in its index that it is going to continue to crawl for some time. It's probably not going to impact you negatively, but if you want to get rid of the errors sooner I would throw in some 301s. \
With the 301s you'll also get any link value that the old pages may be getting from external links (I know you said there are none, but with 200K+ pages it's likely that at least one of the pages is being linked to from somewhere).
-
Have you submitted a new sitemap to Webmaster Tools? Also, you could consider 301 redirecting the pages to relevant new pages to capitalize on any link equity or ranking power they may have had before. Otherwise Google should eventually stop crawling them because they are 404. I've had a touch of success getting them to stop crawling quicker (or at least it seems quicker) by changing some 404s to 410s.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Deleteing old page and passing on link strenth?
We are a printing company and thinking over bringing our products down to 2 - 3 rather than the 10+ we currently have, the pages we will be getting rid of will be pages such as flyers, booklets etc and just concentrating on banners and stickers would you suggest 301ing the pages to the home page or picking pages for them to go to? Also could we expect a decent raise for the pages we are left with? Thanks shaun
Technical SEO | | BobAnderson0 -
Pages Indexed Not Changing
I have several sites that I do SEO for that are having a common problem. I have submitted xml sitemaps to Google for each site, and as new pages are added to the site, they are added to the xml sitemap. To make sure new pages are being indexed, I check the number of pages that have been indexed vs. the number of pages submitted by the xml sitemap every week. For weeks now, the number of pages submitted has increased, but the number of pages actually indexed has not changed. I have done searches on Google for the new pages and they are always added to the index, but the number of indexed pages is still not changing. My initial thought was as new pages are added to the index, old ones are being dropped. But I can't find evidence of that, or understand why that would be the case. Any ideas on why this is happening? Or am I worrying about something that I shouldn't even be concerned with since new pages are being indexed?
Technical SEO | | ang1 -
Cached pages still showing on Google
We noticed our QA site showing up on Google so we blocked them in our robot.txt file. We still had an issue with them crawling it so we blocked the site from the public. Now Google is still showing a cached version from the first week in March. Do we just have to wait until they try to re-crawl the site to clear this out or is there a better way to try and get these pages removed from results?
Technical SEO | | aspenchicago0 -
Advice on improve this content page for seo and google
Hi, i use joomla and i am looking for some help to find out what i should be doing to make my content pages better for seo and google. I would be grateful if people would look at the following page as an example http://www.in2town.co.uk/trip-advisor/top-american-ski-resorts-for-over-50s and let me know what i should be doing to make it better for seo and for google so people can find the page. I am using the above page as an example so i can learn from it. I would be grateful if people could look at the source code for the page to see if there is anything that should be in their that is not and if i should be looking at any joomla plugins for the content pages to improve the seo of the page. Any help to improve my seo for my content pages would be great. many thanks
Technical SEO | | ClaireH-1848860 -
Google Dropping Pages After SEO Clean Up
I have been using SEOmoz to clear errors from a site. There
Technical SEO | | Andy56
were over 10,000 errors to start with. Most of these were duplicate content, duplicate titles and too many links on a page. Most of the duplicate errors have now been
cleared. This has been done in two weeks (down to around 3000 errors now). But instead of improving my rankings, pages that were on the second page of Google have started to drop out of the listings altogether. The pages that are dropping out
are not related to the duplicate problems and get A grades when I run SEOmoz
page reports. Can you clean up too much too quickly or is there likely to be another reason for it?0 -
Wrong page version in the index
Hi, my site is currently accessible through URL with and without www. The Version with www has 10 times more Backlinks (PA 45 vs 38) but is not listet into the google Index. As far as I know there was never made a google Webmaster account or declared otherwise the version without www to be 'cannonical'. Basically I think that for SEO reasons it would be much better to declare the with www version to be cannonical and redirect the without www version to it. My questions are: Do you have an idea why the with www version is not indexed?
Technical SEO | | Naturalmente
How long does Google usually take to change the version in the index?
Do I risk my site to be thrown out of the index for some days untill the change is made? Thanks in advance.0 -
I have a site that has both http:// and https:// versions indexed, e.g. https://www.homepage.com/ and http://www.homepage.com/. How do I de-index the https// versions without losing the link juice that is going to the https://homepage.com/ pages?
I can't 301 https// to http:// since there are some form pages that need to be https:// The site has 20,000 + pages so individually 301ing each page would be a nightmare. Any suggestions would be greatly appreciated.
Technical SEO | | fthead90 -
Removing a site from Google's index
We have a site we'd like to have pulled from Google's index. Back in late June, we disallowed robot access to the site through the robots.txt file and added a robots meta tag with "no index,no follow" commands. The expectation was that Google would eventually crawl the site and remove it from the index in response to those tags. The problem is that Google hasn't come back to crawl the site since late May. Is there a way to speed up this process and communicate to Google that we want the entire site out of the index, or do we just have to wait until it's eventually crawled again?
Technical SEO | | issuebasedmedia0