Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to Stop Google from Indexing Old Pages
-
We moved from a .php site to a java site on April 10th. It's almost 2 months later and Google continues to crawl old pages that no longer exist (225,430 Not Found Errors to be exact).
These pages no longer exist on the site and there are no internal or external links pointing to these pages.
Google has crawled the site since the go live, but continues to try and crawl these pages.
What are my next steps?
-
All my clients are impatient with Google's crawl. I think the speed of life on the web has spoiled them. Assuming your site isn't a huge e-commerce or subject-matter site...you will get crawled but not right away. Smaller, newer sites take time.
Take any concern and put it towards link building to the new site so Google's crawlers find it faster (via their seed list). Get it up on DMOZ, get that Twitter account going, post videos to Youtube, etc. Get some juicy high-PR inbound links and that could help speed up the indexing. Good luck!
-
Like Mike said above, there still isn't enough info provided for us to give you a very clear response, but I think he is right to point out that you shouldnt really care about the extinct pages in Google's index. They should, at some point, expire.
You can specify particular URLs to remove in GWT, or your robots.txt file, but that doesn't seem the best option for you. My recommendation is to just prepare the new site in the new location, upload a good clean sitemap.xml to GWT, and let them adjust. If you have much of the same content as well, Google will know due to the page creation date which is the newer and more appropriate site. Hate to say "trust the engines" but in this case, you should.
You may also consider a rel="author" tag in your new site to help Google prioritize the new site. But really the best thing is a new site on a new domain, a nice sitemap.xml, and patience.
-
To further clear things up...
I can 301 every page from the old .php site to our new homepage (However, I'm concerned about Google's impression of our overall user experience).
Or
I can 410 every page from the old .php site (Wouldn't this tell Google to stop trying to crawl these pages? Although these pages technically still exist, they just have a different URL and directory structure. Too many to set up individual 301's tho).
Or
I can do nothing and wait for these pages to drop off of Google's radar
What is the best option?
-
After reading the further responses here I'm wondering something...
You switched to a new site, can't 301 the old pages, and have no control over the old domain... So why are you worried about pages 404ing on an unused site you don't control anymore?
Maybe I'm missing something here or not reading it right. Who does control the old domain then? Is the old domain just completely gone? Because if so, why would it matter that Google is crawling non-existent pages on a dead site and returning 404s and 500s? Why would that necessarily affect the new site?
Or is it the same site but you switched to Java from PHP? If so, wouldn't your CMS have a way of redirecting the old pages that are technically still part of your site to the newer relevant pages on the site?
I feel like I'm missing pertinent info that might make this easier to digest and offer up help.
-
Sean,
Many thanks for your response. We have submitted a new, fresh site map to Google, but it seems like it's taking them forever to digest the changes.
We've been keeping track of rankings, and they've been going down, but there are so many changes going on at once with the new site, it's hard to tell what is the primary factor for the decline.
Is there a way to send Google all of the pages that don't exist and tell them to stop looking for them?
Thanks again for your help!
-
You would need access to the domain to set up the 301. If you no longer can edit files on the old domain, then your best bet is to update Webmaster Tools with the new site info and a sitemap.xml and wait for their caches to expire and update.
Somebody can correct me on this if I'm wrong, but getting so many 404s and 500's already has probably impacted your rankings so significantly, that you may be best served to approach the whole effort as a new site. Again, without more data, I'm left making educated guesses here. And if you aren't tracking your rankings (as you asked how much it is impacting...you should be able to see), then I would let go of the old site completely and build search traffic fresh on the new domain. You'd probably generate better results in the long term by jettisoning a defunct site with so many errors.
I confess, without being able to dig into the site analytics and traffic data, I can't give direct tactical advice. However, the above is what I would certainly do. Resubmitting a fresh sitemap.xml to GWT and deleting all the info to the old site in there is probably your best option. I defer to anyone with better advice. What a tough position you are in!
-
Thanks all for the feedback.
We no longer have access to the old domain. How do we institute a 301 if we can no longer access the page?
We have over 200,000 pages throwing 404's and over 70,000 pages throwing 500 errors.
This probably doesn't look good to Google. How much is this impacting our rankings?
-
Like others have said, a 301 redirect and updating Webmaster Tools should be most of what you need to do. You didn't say if you still have access to the old domain (where the pages are still being crawled) or if you get a 404, 503, or some other error when navigating to those pages. What are you seeing or can you provide a sample URL? That may help eliminate some possibilities.
-
You should implement 301 redirects from your old pages to their new locations. It's sounds like you have a fairly large site, which means Google has tons of your old pages in its index that it is going to continue to crawl for some time. It's probably not going to impact you negatively, but if you want to get rid of the errors sooner I would throw in some 301s. \
With the 301s you'll also get any link value that the old pages may be getting from external links (I know you said there are none, but with 200K+ pages it's likely that at least one of the pages is being linked to from somewhere).
-
Have you submitted a new sitemap to Webmaster Tools? Also, you could consider 301 redirecting the pages to relevant new pages to capitalize on any link equity or ranking power they may have had before. Otherwise Google should eventually stop crawling them because they are 404. I've had a touch of success getting them to stop crawling quicker (or at least it seems quicker) by changing some 404s to 410s.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Indexing without content
Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem?
Technical SEO | | AtuliSulava1 -
Google Not Indexing Pages (Wordpress)
Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed.
Technical SEO | | Hasanovic1 -
My video sitemap is not being index by Google
Dear friends, I have a videos portal. I created a video sitemap.xml and submit in to GWT but after 20 days it has not been indexed. I have verified in bing webmaster as well. All videos are dynamically being fetched from server. My all static pages have been indexed but not videos. Please help me where am I doing the mistake. There are no separate pages for single videos. All the content is dynamically coming from server. Please help me. your answers will be more appreciated................. Thanks
Technical SEO | | docbeans0 -
Google stopped crawling my site. Everybody is stumped.
This has stumped the Wordpress staff and people in the Google Webmasters forum. We are in Google News (have been for years), and so new posts are crawled immediately. On Feb 17-18 Crawl Stats dropped 85%, and new posts were no longer indexed (not appearing on News or search). Data highlighter attempts return "This URL could not be found in Google's index." No manual actions by Google. No changes to the website; no custom CSS. No Site Errors or new URL errors. No sitemap problems (resubmitting didn't help). We're on wordpress.com, so no odd code. We can see the robot.txt file. Other search engines can see us, as can social media websites. Older posts still index, but loss of News is a big hit. Also, I think overall Google referrals are dropping. We can Fetch the URL for a new post, and many hours later it appears on Google and News, and we can then use Data Highlighter. It's now 6 days and no recovery. Everybody is stumped. Any ideas? I just joined, so this might be the wrong venue. If so, apologies.
Technical SEO | | Editor-FabiusMaximus_Website0 -
How to stop google from indexing specific sections of a page?
I'm currently trying to find a way to stop googlebot from indexing specific areas of a page, long ago Yahoo search created this tag class=”robots-nocontent” and I'm trying to see if there is a similar manner for google or if they have adopted the same tag? Any help would be much appreciated.
Technical SEO | | Iamfaramon0 -
Is Google suppressing a page from results - if so why?
UPDATE: It seems the issue was that pages were accessible via multiple URLs (i.e. with and without trailing slash, with and without .aspx extension). Once this issue was resolved, pages started ranking again. Our website used to rank well for a keyword (top 5), though this was over a year ago now. Since then the page no longer ranks at all, but sub pages of that page rank around 40th-60th. I searched for our site and the term on Google (i.e. 'Keyword site:MySite.com') and increased the number of results to 100, again the page isn't in the results. However when I just search for our site (site:MySite.com) then the page is there, appearing higher up the results than the sub pages. I thought this may be down to keyword stuffing; there were around 20-30 instances of the keyword on the page, however roughly the same quantity of keywords were on each sub pages as well. I've now removed some of the excess keywords from all sections as it was getting in the way of usability as well, but I just wanted some thoughts on whether this is a likely cause or if there is something else I should be worried about.
Technical SEO | | Datel1 -
Investigating a huge spike in indexed pages
I've noticed an enormous spike in pages indexed through WMT in the last week. Now I know WMT can be a bit (OK, a lot) off base in its reporting but this was pretty hard to explain. See, we're in the middle of a huge campaign against dupe content and we've put a number of measures in place to fight it. For example: Implemented a strong canonicalization effort NOINDEX'd content we know to be duplicate programatically Are currently fixing true duplicate content issues through rewriting titles, desc etc. So I was pretty surprised to see the blow-up. Any ideas as to what else might cause such a counter intuitive trend? Has anyone else see Google do something that suddenly gloms onto a bunch of phantom pages?
Technical SEO | | farbeseo0 -
Unnecessary pages getting indexed in Google for my blog
I have a blog dapazze.com and I am suffering from a problem for a long time. I found out that Google have indexed hundreds of replytocom links and images attachment pages for my blog. I had to remove these pages manually using the URL removal tool. I had used "Disallow: ?replytocom" in my robots.txt, but Google disobeyed it. After that, I removed the parameter from my blog completely using the SEO by Yoast plugin. But now I see that Google has again started indexing these links even after they are not present in my blog (I use #comment). Google have also indexed many of my admin and plugin pages, whereas they are disallowed in my robots.txt file. Have a look at my robots.txt file here: http://dapazze.com/robots.txt Please help me out to solve this problem permanently?
Technical SEO | | rahulchowdhury0