Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing Content 301 vs 410 question
-
Hello,
I was hoping to get the SEOmoz community’s advice on how to remove content most effectively from a large website.
I just read a very thought-provoking thread in which Dr. Pete and Kerry22 answered a question about how to cut content in order to recover from Panda. (http://www.seomoz.org/q/panda-recovery-what-is-the-best-way-to-shrink-your-index-and-make-google-aware).
Kerry22 mentioned a process in which 410s would be totally visible to googlebot so that it would easily recognize the removal of content. The conversation implied that it is not just important to remove the content, but also to give google the ability to recrawl that content to indeed confirm the content was removed (as opposed to just recrawling the site and not finding the content anywhere).
This really made lots of sense to me and also struck a personal chord… Our website was hit by a later Panda refresh back in March 2012, and ever since then we have been aggressive about cutting content and doing what we can to improve user experience.
When we cut pages, though, we used a different approach, doing all of the below steps:
1. We cut the pages
2. We set up permanent 301 redirects for all of them immediately.
3. And at the same time, we would always remove from our site all links pointing to these pages (to make sure users didn’t stumble upon the removed pages.When we cut the content pages, we would either delete them or unpublish them, causing them to 404 or 401, but this is probably a moot point since we gave them 301 redirects every time anyway. We thought we could signal to Google that we removed the content while avoiding generating lots of errors that way…
I see that this is basically the exact opposite of Dr. Pete's advice and opposite what Kerry22 used in order to get a recovery, and meanwhile here we are still trying to help our site recover. We've been feeling that our site should no longer be under the shadow of Panda.
So here is what I'm wondering, and I'd be very appreciative of advice or answers for the following questions:
1. Is it possible that Google still thinks we have this content on our site, and we continue to suffer from Panda because of this?
Could there be a residual taint caused by the way we removed it, or is it all water under the bridge at this point because Google would have figured out we removed it (albeit not in a preferred way)?2. If there’s a possibility our former cutting process has caused lasting issues and affected how Google sees us, what can we do now (if anything) to correct the damage we did?
Thank you in advance for your help,
Eric -
Thanks Dr Peter! I agree with you! Just wanted to feel shure about it.
Yes, Gary, you can personalize also a 410 page.
-
You should be able to customize a 410 just like you do a 404. The problem is that most platforms don't do that, by default, so you get the old-school status code page. That should be configurable, though, on almost all modern platforms.
-
From a commerce perspective the biggest problem I have with the 410 is the user experience. If I tag a URL with a 410 when someone request the page they get a white page that says GONE. They never even get the chance to see the store and maybe search for a similar product.
Would it work if I built a landing page that returns a 410 and then used the 301 to redirect the bad URL to the landing page? It would make the customer happy, they would be in the store with a message to search for something else. But would Google really associate the 410 with the redirected URL?
-
Hi Sandra, don't worry about 404s volume because they won't hurt your rankings.
About your issue I understand that you want to be really clear with your users and don't hurt their experience on the site. So create a custom 404 which changes its content depending of what page is returning it. If it's one of your old product you can return a message or an article of why you decided to remove them and propose some alternatives. For all other errors you can just return a search box or related products to the one you lost.
301 IMHO are not the way to go, if an url is gone it has not being redirected anywhere, so a 301 will result in a bad UX 99% of the time.
-
Hello,
I have a related question about 301 vs 410.
I have a client who wants to delete a whole category of product from one site. It's a big amount of product, so a big amount of urls, but this product is not working very well. So the decision is not SEO-related but more as a business decision. It's not for Panda.
If we think about the communication with the user, the best option would be to have a landing page explaining that we decided to remove that product.
Then the question is, do we do a redirect 301 of all those urls to this landing page? I am afraid that a big redirect like this, going from many urls to a single one (even if this is not created to rank on google) can be seen dodgy by Google. Am I right?
Or do I do a 410 for those pages, and I personalize the 410 landing only for these urls in order to communicate with the user (is that even possible?). But I am afraid, because we'll have much 4XX Errors in WMT, and this may have influence to the rankings!
So I don't know what to do! It's a must that we delete this content and that we communicate it well with the users.
Thanks for your help,
-
100% agreed - 403 isn't really an appropriate alternative to 404. I know SEOs who claim that 410s are stronger/faster, but I haven't seen great evidence in the past couple of years. It's harmless to try 410s, but I wouldn't expect miracles.
-
Hi Eric, I'll try to answer your further question even if I'm not an oracle like Pete
First of all thanks Pete to underline that you need to give google just one response since you can't give them both 301 and 404, I was assuming that and I didn't focus on that part of Eric's answer.
Second. Eric, If your purpose is to give google the ability of recrawl the old content to let them see it has disappeared you want to give them a 404 or a 410 which are respectively not found and permanently not found. Before it was a difference but now they've almost the same value under google's eyes (further reading). In that way google can access your page and see that those contents are now gone.
In the case of 403 the access is denied to anyone both google and humans, so in that case google won't be able to access and recrawl it. If your theory is based (and I think you're in the good way) upon the thing that google needs to recrawl your content and see it ahs really gone, 403 is not the response you should give it.
-
Hey there mememax - thank you for the reply! Reading your post and thinking back to our methodology, yes I think in hindsight we were a bit too afraid about generating errors when we removed content - we should have considered the underlying meaning of the different statuses more carefully. I appreciate your advice.
Eric
-
Hello Dr. Pete – thank you for the great info and advice!
I do have one follow-up question if that's ok – as we move forward cutting undesirable content and generate 4xx status for those pages, is there a difference in impact/effectiveness between a 403 and a 404? We use a CMS and un-publishing a page creates a 403 “Access denied” message. Deleting a page will generate a 404. I would love to hear your opinion about any practical differences from a Googlebot standpoint… does a 404 carry more weight when it comes to content removal, or are they the same to Googlebot? If there’s a difference and the 404 is better, we’ll go the 404 route moving forward.
Thanks again for all your help,
Eric
-
Let me jump in and clarify one small detail. If you delete a page, which would naturally result in a 404, but then 301-redirect that page/URL, there is no 404. I understand the confusion, but ultimately you can only have one HTTP status code. So, if the page properly 301s, it will never return a 404, even if it's technically deleted.
If the page 301s to a page that looks like a "not found" sort of page (content-wise), Google could consider that a "soft 404". Typically, though, once the 301 is in place, the 404 is moot.
For any change in status, the removal of crawl paths could slow Google re-processing those pages. Even if you delete a page, Google has to re-crawl it to see the 404. Now, if it's a high-authority page or has inbound (external) links, it could get re-crawled even if you cut the internal links. If it's a deep, low-value page, though, it may take Google a long time to get back and see those new signals. So, sometimes we recommend keeping the paths open.
There are other ways to kick Google to re-crawl, such as having an XML sitemap open with those pages in them (but removing the internal links). These signals aren't as powerful, but they can help the process along.
As to your specific questions:
(1) It's very tricky, in practice, especially at large-scale. I think step 1 is to dig into your index/cache (slice and dice with the site: operator) and see if Google has removed these pages. There are cases where massive 301s, etc. can look fishy to Google, but usually, once a page is gone, it's gone. If Google has redirected/removed these pages, and you're still penalized, then you may be fixing the wrong problem or possibly haven't gone far enough.
(2) It really depends on the issue. If you cut too deep and somehow cut off crawl paths or stranded inbound links, then you may need to re-establish some links/pages. If you 301'ed a lot of low-value content (and possibly bad links), you may actually need to cut some of those 301s and let those pages die off. I agree with @mememax that sometimes a helathy combination of 301s/404s is a better bet - pages go away, and 404s are normal if there's really no good alternative to the page that's gone.
-
Hi Eric, in my experience I've always found 4** better than 301 to solve this kind of issues.
Many people uses this response too much just because they want to show google that their site don't have any 404.
Just think about it a little, a 301 is a permanent redirect, a content which has just moved from one place to another. If you got a content you want to get rid of, do you want to give google the message "hey that low quality content is not where you found it but now it's here", no. You wan't to give google the message that the low quality content has been improved or removed. And a 404 is the right message to give him if you deleted that content.
It's prefectly normal to have 404s in a website, many 404 won't hurt your rankings, only if those pages were ranking already so users will receive a 404 instead and if some external sites were linking there in that case you may consider a 301.
While I think that google has a sort of a black list (and a white list too) I don't think that it has a memory of bad sites he encounters, if you fix your issues you'll start to rank again.
The issue you may have is not that you're site may be tainted but that maybe you still have some issues here and there which you didn't fix. As it seems Googlers said that Panda is now part of the algo so if you fix your issues you won't need any upgrade to start re ranking.
Hope this may have helped!! G luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
410 or 301 after URL update?
Hi there, A site i'm working on atm has a thousand "not found" errors on google console (of course, I'm sure there are thousands more it's not showing us!). The issue is a lot of them seem to come from a URL change. Damage has been done, the URLs have been changed and I can't stop that... but as you can imagine, i'm keen to fix as many as humanly possible. I don't want to go mad with 301s - but for external links in, this seems like the best solution? On the other hand, Google is reading internal links that simply aren't there anymore. Is it better to hunt down the new page and 301-it anyway? OR should I 410 and grit my teeth while google crawls and recrawls it, warning me that this page really doesn't exist? Essentially I guess I'm asking, how many 301s are too many and will affect our DA? And what's the best solution for dealing with mass 404 errors - many of which aren't attached or linked to from any other pages anymore? Thanks for any insights 🙂
Intermediate & Advanced SEO | | Fubra0 -
6 .htaccess Rewrites: Remove index.html, Remove .html, Force non-www, Force Trailing Slash
i've to give some information about my website Environment 1. i have static webpage in the root. 2. Wordpress installed in sub-dictionary www.domain.com/blog/ 3. I have two .htaccess , one in the root and one in the wordpress
Intermediate & Advanced SEO | | NeatIT
folder. i want to www to non on all URLs Remove index.html from url Remove all .html extension / Re-direct 301 to url
without .html extension Add trailing slash to the static webpages / Re-direct 301 from non-trailing slash Force trailing slash to the Wordpress Webpages / Re-direct 301 from non-trailing slash Some examples domain.tld/index.html >> domain.tld/ domain.tld/file.html >> domain.tld/file/ domain.tld/file.html/ >> domain.tld/file/ domain.tld/wordpress/post-name >> domain.tld/wordpress/post-name/ My code in ROOT htaccess is <ifmodule mod_rewrite.c="">Options +FollowSymLinks -MultiViews RewriteEngine On
RewriteBase / #removing trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ $1 [R=301,L] #www to non
RewriteCond %{HTTP_HOST} ^www.(([a-z0-9_]+.)?domain.com)$ [NC]
RewriteRule .? http://%1%{REQUEST_URI} [R=301,L] #html
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^.]+)$ $1.html [NC,L] #index redirect
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/
RewriteRule ^index.html$ http://domain.com/ [R=301,L]
RewriteCond %{THE_REQUEST} .html
RewriteRule ^(.*).html$ /$1 [R=301,L]</ifmodule> The above code do 1. redirect www to non-www
2. Remove trailing slash at the end (if exists)
3. Remove index.html
4. Remove all .html
5. Redirect 301 to filename but doesn't add trailing slash at the end0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
301 Redirect of subdomain?
Fellow Mozzers, I'm having a hard time wrapping my brain around a redirect issue and thought it was worth posing the question to the Moz community. I did a search first but couldn't find the exact answer I was looking for. How does a 301 redirect work when you redirect a sub domain example.homepage.com to www.homepage.com but you keep the sub directories of example.homepage.com/page-1 active and are trying to rank them? I'm dealing with a current project where this is happening and this doesn't make sense to me, to redirect the subdomain if you're also trying to rank/create search traffic for pages, sub directories on example.homepage.com. This also get's into the debate of if a sub domain site is viewed as it's own website and therefore has to rank itself. If this is true, it seems like we're kind of killing the authority of the site by redirecting it. Additionally, www.homepage.com has a much stronger link profile than example.homepage.com I hope this makes sense. Any thoughts are appreciated. Thanks for your time.
Intermediate & Advanced SEO | | SMG-Texas0 -
Remove URLs that 301 Redirect from Google's Index
I'm working with a client who has 301 redirected thousands of URLs from their primary subdomain to a new subdomain (these are unimportant pages with regards to link equity). These URLs are still appearing in Google's results under the primary domain, rather than the new subdomain. This is problematic because it's creating an artificial index bloat issue. These URLs make up over 90% of the URLs indexed. My experience has been that URLs that have been 301 redirected are removed from the index over time and replaced by the new destination URL. But it has been several months, close to a year even, and they're still in the index. Any recommendations on how to speed up the process of removing the 301 redirected URLs from Google's index? Will Google, or any search engine for that matter, process a noindex meta tag if the URL's been redirected?
Intermediate & Advanced SEO | | trung.ngo0 -
Too many 301 redirects?
Hey, My company currently has one chief website with about 500-600 other domains that all feature the same material as the chief website. These domains have been around for about 5 years and have actually picked up some link traffic. I have all of these identical web-pages utilizing rel=canonical but I was wondering if I would be better served, from SEO purposes, to 301 redirect all of these sites to their respective pages on our chief website? If I add 500 301 redirects, will the major search engines consider this to be black-hat link-building even though the sites are related and technically already feature the same content? For an example, the chief website is www.1099pro.com and I would 301 redirect the below sites to the chief site: 1099softwarepro.com 1099softwarepro.info 1099softwarepro.net 1099softwarepro.biz 1099softwareprofessionals.com 1099softwareprofessionals.info ...you get the point
Intermediate & Advanced SEO | | Stew2220 -
Brackets vs Encoded URLs: The "Same" in Google's eyes, or dup content?
Hello, This is the first time I've asked a question here, but I would really appreciate the advice of the community - thank you, thank you! Scenario: Internal linking is pointing to two different versions of a URL, one with brackets [] and the other version with the brackets encoded as %5B%5D Version 1: http://www.site.com/test?hello**[]=all&howdy[]=all&ciao[]=all
Intermediate & Advanced SEO | | mirabile
Version 2: http://www.site.com/test?hello%5B%5D**=all&howdy**%5B%5D**=all&ciao**%5B%5D**=all Question: Will search engines view these as duplicate content? Technically there is a difference in characters, but it's only because one version encodes the brackets, and the other does not (See: http://www.w3schools.com/tags/ref_urlencode.asp) We are asking the developer to encode ALL URLs because this seems cleaner but they are telling us that Google will see zero difference. We aren't sure if this is true, since engines can get so _hung up on even one single difference in character. _ We don't want to unnecessarily fracture the internal link structure of the site, so again - any feedback is welcome, thank you. 🙂0 -
Duplicate Content | eBay
My client is generating templates for his eBay template based on content he has on his eCommerce platform. I'm 100% sure this will cause duplicate content issues. My question is this.. and I'm not sure where eBay policy stands with this but adding the canonical tag to the template.. will this work if it's coming from a different page i.e. eBay? Update: I'm not finding any information regarding this on the eBay policy's: http://ocs.ebay.com/ws/eBayISAPI.dll?CustomerSupport&action=0&searchstring=canonical So it does look like I can have rel="canonical" tag in custom eBay templates but I'm concern this can be considered: "cheating" since rel="canonical is actually a 301 but as this says: http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html it's legitimately duplicate content. The question is now: should I add it or not? UPDATE seems eBay templates are embedded in a iframe but the snap shot on google actually shows the template. This makes me wonder how they are handling iframes now. looking at http://www.webmaster-toolkit.com/search-engine-simulator.shtml does shows the content inside the iframe. Interesting. Anyone else have feedback?
Intermediate & Advanced SEO | | joseph.chambers1