Panda Recovery - What is the best way to shrink your index and make Google aware?
-
We have been hit significantly with Panda and assume that our large index with some pages holding thin/duplicate content being the reason.
We have reduced our index size by 95% and have done significant content development on the remaining 5% pages.
For the old, removed pages, we have installed 410 responses (Page does not exist any longer) and made sure that they are removed from the sitempa submitted to Google; however after over a month we still see Google spider returning to the same pages and the webmaster tools shows no indicator that Google is shrinking our index size.
Are there more effective and automated ways to make Google aware of a smaller index size in hope of Panda recovery? Potentially using the robots.txt file, GWT URL removal tool etc?
Thanks /sp80
-
Hi. I would be curious to know if anyone else has experienced something similar and recovered from Panda. How long did it take you? Did you manually remove the pages, set up 410s or 404s, or create 301s?
I've been working on a site for sometime now which has lost a great of traffic since July 2013. Over the past 2 months, a process has gone underway to manually remove the URLs from the index. The index has been cut in half, but still not at what it was pre-penalty. About 20,000 more pages to figure out what needs to be removed before it reaches the level it was before the massive traffic drop.
Any recovery or insight would be helpful.
-
Hi Sp80 (and group),
It's been about six months since you posted your Panda recovery question. I'm curious if you implemented Kerry22's suggestions, and what results you've seen. I hope it's worked out for you.
We're also dealing with removing thousands of pages of thin content (through 410s, keeping links up and sitemaps, as per Kerry's suggestion). This was a very helpful discussion to read.
Thanks,
Tom
-
Hi kerry,
Your post gives me some hope. I was hit by Panda in Feb. 2011 and lost 85% of my google traffic Made many changes to my site -- page deletions re-directs added content etc. Got a bump of 25% in September 2011 but lost that and more afterward.
We have an e-commerce gift site with 6000 pages. Is your site an e-commerce site?
I have not found a recovery story from any sites like mine that were hit with that large a drop.
I hope your recovery would relate to my situation.
-
Did Google process the 301s? In other words, are the old pages still in the index or not? If they processed the 301s eventually, you generally should be ok. If the old URLs seem stranded, then you might be best setting up the XML sitemap with those old URLs to just kick Google a little. I don't think I'd switch signals and move from a 301 to 404, unless the old pages are low quality, had bad links, etc.
Unfortunately, these things are very situational, so it can be hard to speak in generalities.
-
Hi Dr. Pete,
I know this is a late entry into this thread, but.. what if we did all our content cutting in the wrong ways over the past year – is there something we could/should do now to correct for this? Our site was hit by panda back in March 2012, and since then we've cut content several times. But we didn’t use this good process you advocate – here’s what we did when we cut pages:
1. We set up permanent 301 redirects for all of them immediately
2. Simultaneously, we always removed all links pointing to cut pages (we wanted to make sure users didn’t get redirected all the time)This is a far cry from what you recommend and what Kerry22 did to recover successfully. If you have some advice on the following questions, I’d definitely appreciate it:
- Is it possible Google still thinks we have this content on our site or intend to bring it back, and as a result we continue to suffer?
- If that is a possibility, then what can we do now (if anything) to correct the damage we did?
We're thinking about removing all of those 301s now, letting all cut content return 404s and making a separate sitemap of cut content to submit it to Google. Do you think it's too late or otherwise inadvisable for us to do this kind of thing?
Thanks in advance,
Eric -
It might be worth exploring NOINDEX'ing the useful pages and 410'ing the non-useful ones, if only because sometimes a mix of signals is more palatable to Google. Any time you remove a swatch of content with one method, it can trigger alarm bells. I'll be honest, though - these situations are almost always tricky and you almost always have to measure and adjust. I've never found a method that's right for all situations.
-
Thanks Pete,
I appreciate your input. Next to the additional sitemap with the known Google-indexed URLs we want deindexed, we also have reopened some crawl paths to these pages to see if there is a speed up.
This is an undertaking carried out across 30 international properties so we will be able to experiment with measures for certain domains and see how it affects de-indexing speed as we are tracking the numbers reported by Google daily.
I agree about the bad user experience of 410s as a dead end. We are mostly de-indexing as a mean of recovery from Panda but the content pages that we try to deindex are actually still useful to the users, just thin and partially duplicative in content. We have decided to still display the content when such page is reached but return a status code of 410. Alternatively it seems we could just set the robot tag to noindex but my feeling is the 410 approach will lead to faster deindexing - would you agree?
Also if you have any expertise to share on how to compile a more ocomprehensive list of URLs indexed by Google for a particular domain other than scraping the web interface using the site:domain.com query approach (only returns a small subset compared to the stated total number of indexed pages) please let me know.
Thanks again /Thomas
-
If you want to completely remove these pages, I think Kerry22 is spot on. A 410 is about the fastest method we know of, and her points about leaving the crawl paths open are very important. I completely agree with leaving them in a stand-alone sitemap - that's good advice.
Saw your other answer, so I assume you don't want to 301 or canonical these pages. The only caveat I'd add is user value. Even if the pages have no links, make sure people aren't trying to visit them.
This can take time, especially at large scale, and a massive removal can look odd to Google. This doesn't generally result in a penalty or major problems, but it can cause short-term issues as Google re-evaluates the site.
The only option to speed it up is, if the pages have a consistent URL parameter or folder structure, you may be able to do a mass removal in Google Webmaster Tools. This can be faster, but it's constrained to similar-looking URLs. In other words, there has to be a pattern. The benefit is that you can make the GWT request on top of the 410s, so that can sometimes help. Any massive change takes time, though, and often requires some course correction, I find.
-
Think second sitemap will be fine. Wouldn't add a page with just links as that is the type of page Panda doesn't like.
Regarding sets of pages - we started by going into the search results - found a lot of content that shouldn't have been indexed.
We then looked manually at the content on subsets of pages and found pages that were thin and very similar to others (at the product level) and either made them more unique or removed them. Tools like this also help identify similar pages across products/categories http://www.copyscape.com/compare.php
It's only been 2 weeks, so it looks like we have pretty much 80% recovered and still improving - still looking at numbers and over Christmas and NY obviously traffic is quiet. I think 100% recovery is dependent on too many variables, like whether you continue link building during your time fixing the site, losing links by removing pages, adding more pages, competitors gaining authority/rankings etc
-
Hey Kerry,
There was addition of additional pages in April which is also when our sites started seeing a decrease in rankings - so the timing adds up.
The drops starting June have no clear root for us - we started our de-indexation process starting of December.
We are thinking to speed up indexation exclusively through a second Google Sitemap as anything else would need to be a very artificial landing page with a high number of links at this point. Would you be concerned exclusively using a Sitemap over keeping the unwanted pages linked from your linking structure?
Further, I am interested in how you determined the set of pages you know were part of the Google index to be delisted? It appears the best way to do so is to scrape the Google search results of pages returned for a domain and build up a list this way.
Did you recover completely to prior Panda?
Best /Thomas
-
Hi
No problem, I am happy to help!
Yes. graph declined sloooowly but only when we started removing pages. This is half the problem - you have to wait for Google to find the changes. The waiting is frustrating as you don't know if what you have done is right, but the stuff I listed will help speed it up. We literally had to wait until none of the pages could be found in the index.
I see a big increase in your indexation from April to May 2012. When did you get hit and what happened over that month - did you add a lot of new pages/products? Are those drops in indexation from June to Dec 2012 you removing pages or did the drop just start to 'happen' and then you got hit?
-
Kerry,
Thank your for your amazing response on the deindexing question I had. It was incredibly well written and very easy to follow. Very happy to hear you were able to recover.
You make a really good point; allowing Google to still be able to reach the pages; when we started reviewing our site structure we also changed our linking structure so while all pages we dont want to have longer in the index return a 410 they certainly aren't all discoverable. Our assumption was that Google will revisit them sooner or later given that they are part of the index but I can definitely imagine that thinks would get sped up by compiling a dedicated sitemap.
A big question I would have for you is how did the index status graph adjust for you in GWT over time? We started our restructuring start of January and we can't see a difference yet: http://imgur.com/eKBJ0
Did you graph decline step by step?
Thanks again
-
Hi
We just recovered from Panda - took us 6 months, but the best way to do this is to 410 or 404 your pages, but don't remove the links. If you remove the links to those pages then Google won't be able to find those pages and know that you have removed them.
Here are the steps you need to follow to get the changes indexed:
1. Remove the pages but leave the links to them on your site (we left these discretely at the bottom of the pages they were on, so users wouldn't find them easily, but Google would). You will see Google slowly start to pick up the number of 404s/410s in Webmaster Tools - don't worry about so many 410s being picked up - it won't hurt you. Don't no follow links, remove links, or block pages with robots.txt. You want Google to find your changes.
2. Revise your sitemaps - take the 410 pages out of the original sitemap and add them to a new separate sitemap and submit this in Webmaster Tools. Then you can see the true indexation rates of your current pages (gives you a good idea of how many are indexed vs not and if you still have issues). You can then also track the deindexation of your 410s separately - see how fast they are being deindexed - be patient, it takes time. We only recovered once they were all deindexed.
Our decision to use sitemaps as well as internal links was due to the fact that some deep pages are only crawled periodically and we wanted Google to find the changes quickly. This is useful: http://www.seomoz.org/blog/logic-meet-google-crawling-to-deindex
4. Then Wait If all your pages are removed and you are still affected by Panda, start looking for more duplicate content, and look with an objective view at your pages that still exist. You may be surprised with what you find. The process took us 6 months because we had to wait for Google to pick up our changes, and then revise, tweak, look for more to do etc.
I will write a case study soon, but in the meantime hope this helps you! I know how frustrating it is.
PS. If you are losing link value from 410s, 410 first, recover from Panda, and then 301 the select pages that have links to get the link juice back. It will be faster that way.
-
Google is already recrawling those pages for the last months but is returning to the pages that return 410. We have very explicit logging configured.
Google URL removal tool is not an option due to the manual character of the submission.
-
I think you need to wait for Google to get them recrawl these pages .. however, you can use Google URL removal tool in Webmaster Tools...
-
Thanks,
To be clear - my question does not look for proposals to recovery but implementation advice around shrinking the Google index size. We are talking about a scale of 10 thousands of pages. /Thomas
-
what about this approach - I am assuming that you know the exact date when the rank falls ..
You need to compare the traffic from Google for each pages. Find out those pages that suffered the most. Either get them removed [just exactly what you are doing] or completely rewrite them, adding nice images, videos etc, in short make it more interactive.
Now locate pages that are not that much affected. You need to make slight changes in them. Do not remove these pages.
Now locate those pages that have not affected at all. If those pages are content heavy, you need to produce some more pages with well written content./
Hope that helps.
-
Correct, it is intentional. The removed links have no link juice. The hop is though that an explicit 410 is a clearer signal for Google to remove the pages form the index.
I have been reading warnings around implementing a significant volume of 301s as it could be considered unnatural.
-
Just curious, is there any reason you did a 410 instead of a 301? I think most webmasters would setup 301 redirects to the most relevant remaining page for each of the pages that you did remove. With a 410, you're effectively dropping backlinks that might have existed to any of the pages that you had.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Only Indexing Canonical Root URL Instead of Specified URL Parameters
We just launched a website about 1 month ago and noticed that Google was indexing, but not displaying, URLs with "?location=" parameters such as: http://www.castlemap.com/local-house-values/?location=great-falls-virginia and http://www.castlemap.com/local-house-values/?location=mclean-virginia. Instead, Google has only been displaying our root URL http://www.castlemap.com/local-house-values/ in its search results -- which we don't want as the URLs with specific locations are more important and each has its own unique list of houses for sale. We have Yoast setup with all of these ?location values added in our sitemap that has successfully been submitted to Google's Sitemaps: http://www.castlemap.com/buy-location-sitemap.xml I also tried going into the old Google Search Console and setting the "location" URL Parameter to Crawl Every URL with the Specifies Effect enabled... and I even see the two URLs I mentioned above in Google's list of Parameter Samples... but the pages are still not being added to Google. Even after Requesting Indexing again after making all of these changes a few days ago, these URLs are still displaying as Allowing Indexing, but Not On Google in the Search Console and not showing up on Google when I manually search for the entire URL. Why are these pages not showing up on Google and how can we get them to display? Only solution I can think of would be to set our main /local-house-values/ page to noindex in order to have Google favor all of our other URL parameter versions... but I'm guessing that's probably not a good solution for multiple reasons.
Intermediate & Advanced SEO | | Nitruc0 -
Shall I hide short product review texts from customers (to avoid google panda/quality issues)?
About 30% of product reviews that the clients of our ecommerce store submitted in the last 10 years are 3 words or less (we did not require any minimum length). Would you recommend to hide those very short review texts? Where to draw the limit?
Intermediate & Advanced SEO | | lcourse
Numeric star rating would still go into our accumulated product rating. My only concern here is what impact it may have on google ranking.
To give some context, the site has for a long time some panda/phantom related issues where there are no obvious reasons that we could point to.0 -
Fetch as Google -- Does not result in pages getting indexed
I run a exotic pet website which currently has several types of species of reptiles. It has done well in SERP for the first couple of types of reptiles, but I am continuing to add new species and for each of these comes the task of getting ranked and I need to figure out the best process. We just released our 4th species, "reticulated pythons", about 2 weeks ago, and I made these pages public and in Webmaster tools did a "Fetch as Google" and index page and child pages for this page: http://www.morphmarket.com/c/reptiles/pythons/reticulated-pythons/index While Google immediately indexed the index page, it did not really index the couple of dozen pages linked from this page despite me checking the option to crawl child pages. I know this by two ways: first, in Google Webmaster Tools, if I look at Search Analytics and Pages filtered by "retic", there are only 2 listed. This at least tells me it's not showing these pages to users. More directly though, if I look at Google search for "site:morphmarket.com/c/reptiles/pythons/reticulated-pythons" there are only 7 pages indexed. More details -- I've tested at least one of these URLs with the robot checker and they are not blocked. The canonical values look right. I have not monkeyed really with Crawl URL Parameters. I do NOT have these pages listed in my sitemap, but in my experience Google didn't care a lot about that -- I previously had about 100 pages there and google didn't index some of them for more than 1 year. Google has indexed "105k" pages from my site so it is very happy to do so, apparently just not the ones I want (this large value is due to permutations of search parameters, something I think I've since improved with canonical, robots, etc). I may have some nofollow links to the same URLs but NOT on this page, so assuming nofollow has only local effects, this shouldn't matter. Any advice on what could be going wrong here. I really want Google to index the top couple of links on this page (home, index, stores, calculator) as well as the couple dozen gene/tag links below.
Intermediate & Advanced SEO | | jplehmann0 -
How can I make a list of all URLs indexed by Google?
I started working for this eCommerce site 2 months ago, and my SEO site audit revealed a massive spider trap. The site should have been 3500-ish pages, but Google has over 30K pages in its index. I'm trying to find a effective way of making a list of all URLs indexed by Google. Anyone? (I basically want to build a sitemap with all the indexed spider trap URLs, then set up 301 on those, then ping Google with the "defective" sitemap so they can see what the site really looks like and remove those URLs, shrinking the site back to around 3500 pages)
Intermediate & Advanced SEO | | Bryggselv.no0 -
Best Way to Create SEO Content for Multiple, International Websites
I have a client that has multiple websites for providing to other countries. For instance, they have a .com website for the US (abccompany.com), a .co.uk website for the UK (abccompany.co.uk), a .de website for Germany (abccompany.de), and so on. The have websites for the Netherlands, France, and even China. These all act as separate websites. They have their own addresses, their own content (some duplicated but translated), their own pricing, their own Domain Authority, backlinks, etc. Right now, I write content for the US site. The goal is to write content for long and medium tail keywords. However, the UK site is interested in having myself write content for them as well. The issue I'm having is how can I differentiate the content? And what is the best way to target content for each country? Does it make sense to write separate content for each website to target results in that country? The .com site will still show up in UK web results still fairly high. Does it make sense to just duplicate the content but in a different language or for the specific audience in that country? I guess the biggest question I'm asking is, what is the best way of creating content for multiples countries' search results? I don't want the different websites to compete with each other in a sense nor do I want to spend extra time trying to rank content for multiple sites when I could just focus on trying to rank one for all countries. Any help is appreciated!
Intermediate & Advanced SEO | | cody1090 -
URL Parameter Being Improperly Crawled & Indexed by Google
Hi All, We just discovered that Google is indexing a subset of our URL’s embedded with our analytics tracking parameter. For the search “dresses” we are appearing in position 11 (page 2, rank 1) with the following URL: www.anthropologie.com/anthro/category/dresses/clothes-dresses.jsp?cm_mmc=Email--Anthro_12--070612_Dress_Anthro-_-shop You’ll note that “cm_mmc=Email” is appended. This is causing our analytics (CoreMetrics) to mis-attribute this traffic and revenue to Email vs. SEO. A few questions: 1) Why is this happening? This is an email from June 2012 and we don’t have an email specific landing page embedded with this parameter. Somehow Google found and indexed this page with these tracking parameters. Has anyone else seen something similar happening?
Intermediate & Advanced SEO | | kevin_reyes
2) What is the recommended method of “politely” telling Google to index the version without the tracking parameters? Some thoughts on this:
a. Implement a self-referencing canonical on the page.
- This is done, but we have some technical issues with the canonical due to our ecommerce platform (ATG). Even though page source code looks correct, Googlebot is seeing the canonical with a JSession ID.
b. Resubmit both URL’s in WMT Fetch feature hoping that Google recognizes the canonical.
- We did this, but given the canonical issue it won’t be effective until we can fix it.
c. URL handling change in WMT
- We made this change, but it didn’t seem to fix the problem
d. 301 or No Index the version with the email tracking parameters
- This seems drastic and I’m concerned that we’d lose ranking on this very strategic keyword Thoughts? Thanks in advance, Kevin0 -
Site Structure: How do I deal with a great user experience that's not the best for Google's spiders?
We have ~3,000 photos that have all been tagged. We have a wonderful AJAXy interface for users where they can toggle all of these tags to find the exact set of photos they're looking for very quickly. We've also optimized a site structure for Google's benefit that gives each category a page. Each category page links to applicable album pages. Each album page links to individual photo pages. All pages have a good chunk of unique text. Now, for Google, the domain.com/photos index page should be a directory of sorts that links to each category page. Alternatively, the user would probably prefer the AJAXy interface. What is the best way to execute this?
Intermediate & Advanced SEO | | tatermarketing0 -
Google is indexing wordpress attachment pages
Hey, I have a bit of a problem/issue what is freaking me out a bit. I hope you can help me. If i do site:www.somesitename.com search in Google i see that Google is indexing my attachment pages. I want to redirect attachment URL's to parent post and stop google from indexing them. I have used different redirect plugins in hope that i can fix it myself but plugins don't work. I get a error:"too many redirects occurred trying to open www.somesitename.com/?attachment_id=1982 ". Do i need to change something in my attachment.php fail? Any idea what is causing this problem? get_header(); ?> /* Run the loop to output the attachment. * If you want to overload this in a child theme then include a file * called loop-attachment.php and that will be used instead. */ get_template_part( 'loop', 'attachment' ); ?>
Intermediate & Advanced SEO | | TauriU0