How to compete with duplicate content in post panda world?
-
I want to fix duplicate content issues over my eCommerce website.
I have read very valuable blog post on SEOmoz regarding duplicate content in post panda world and applied all strategy to my website.
I want to give one example to know more about it.
http://www.vistastores.com/outdoor-umbrellas
Non WWW version:
http://vistastores.com/outdoor-umbrellas redirect to home page.
For HTTPS pages:
https://www.vistastores.com/outdoor-umbrellas
I have created Robots.txt file for all HTTPS pages as follow.
https://www.vistastores.com/robots.txt
And, set Rel=canonical to HTTP page as follow.
http://www.vistastores.com/outdoor-umbrellas
Narrow by search:
My website have narrow by search and contain pages with same Meta info as follow.
http://www.vistastores.com/outdoor-umbrellas?cat=7
http://www.vistastores.com/outdoor-umbrellas?manufacturer=Bond+MFG
http://www.vistastores.com/outdoor-umbrellas?finish_search=Aluminum
I have restricted all dynamic pages by Robots.txt which are generated by narrow by search.
http://www.vistastores.com/robots.txt
And, I have set Rel=Canonical to base URL on each dynamic pages.
Order by pages:
http://www.vistastores.com/outdoor-umbrellas?dir=asc&order=name
I have restrict all pages with robots.txt and set Rel=Canonical to base URL.
For pagination pages:
http://www.vistastores.com/outdoor-umbrellas?dir=asc&order=name&p=2
I have restrict all pages with robots.txt and set Rel=Next & Rel=Prev to all paginated pages.
I have also set Rel=Canonical to base URL.
I have done & apply all SEO suggestions to my website but, Google is crawling and indexing 21K+ pages. My website have only 9K product pages.
Google search result:
Since last 7 days, my website have affected with 75% down of impression & CTR.
I want to recover it and perform better as previous one.
I have explained my question in long manner because, want to recover my traffic as soon as possible.
-
Not a complete answer but instead of rel-canonicaling your dynamic pages you may just want to robot.txt block them somthing like:
Disallow: /*?
this will prevent google from crawling any version of the page that includes the ? in the URL. Cannonical is a suggetion whereas robots is more of a command.
as you can see from this query:
Google has indexed 132 versions of that single page rather than follow your rel=canonical suggestion.
To further enforce this you may be able to use a fancy bit of php code to detect if the url is dynamic and do a
robots noindex, noarchive on only the dynamic renderings of the page.
This could be done like this:
I also believe there are some filtering tools for this right within webmaster tools. Worth a peek if your site is registered.
Additionally where you are redirecting non-www subpages to the home page you may instead want to redirect them to their www versions.
this can be done in htaccess like this:
Redirect non-www to www: RewriteEngine On RewriteBase / RewriteCond %{HTTP_HOST} ^yourdomain.com [NC] RewriteRule ^(.*)$ http://www.yourdomain.com/$1 [L,R=301]
This will likely provide both a better user experience as well as a better solution in googles eyes.
I'm sure some other folks will come in with some other great suggestions for you as well
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content across different domains in different countries?
Hi Guys, We have a 4 sites One in NZ, UK, Canada and Australia. All geo-targeting their respective countries in Google Search Console. The sites are identical. We recently added the same content to all 4 sites. Will this cause duplicate content issues or any issues even though they are in different countries and geo-targeting is set? Cheers.
Intermediate & Advanced SEO | | wickstar0 -
Web accessibility - High Contrast web pages, duplicate content and SEO
Hi all, I'm working with a client who has various URL variations to display their content in High Contrast and Low Contrast. It feels like quite an old way of doing things. The URLs look like this: domain.com/bespoke-curtain-making/ - Default URL
Intermediate & Advanced SEO | | Bee159
domain.com/bespoke-curtain-making/?style=hc - High Contrast page
domain.com/bespoke-curtain-making/?style=lc - Low Contrast page My questions are: Surely this content is duplicate content according to a search engine Should the different versions have a meta noindex directive in the header? Is there a better way of serving these pages? Thanks.0 -
Concerns of Duplicative Content on Purchased Site
Recently I purchased a site of 50+ DA (oldsite.com) that had been offline/404 for 9-12 months from the previous owner. The purchase included the domain and the content previously hosted on the domain. The backlink profile is 100% contextual and pristine. Upon purchasing the domain, I did the following: Rehosted the old site and content that had been down for 9-12 months on oldsite.com Allowed a week or two for indexation on oldsite.com Hosted the old content on my newsite.com and then performed 100+ contextual 301 redirects from the oldsite.com to newsite.com using direct and wild card htaccess rules Issued a Press Release declaring the acquisition of oldsite.com for newsite.com Performed a site "Change of Name" in Google from oldsite.com to newsite.com Performed a site "Site Move" in Bing/Yahoo from oldsite.com to newsite.com It's been close to a month and while organic traffic is growing gradually, it's not what I would expect from a domain with 700+ referring contextual domains. My current concern is around original attribution of content on oldsite.com shifting to scraper sites during the year or so that it was offline. For Example: Oldsite.com has full attribution prior to going offline Scraper sites scan site and repost content elsewhere (effort unsuccessful at time because google know original attribution) Oldsite.com goes offline Scraper sites continue hosting content Google loses consumer facing cache from oldsite.com (and potentially loses original attribution of content) Google reassigns original attribution to a scraper site Oldsite.com is hosted again and Google no longer remembers it's original attribution and thinks content is stolen Google then silently punished Oldsite.com and Newsite.com (which it is redirected to) QUESTIONS Does this sequence have any merit? Does Google keep track of original attribution after the content ceases to exist in Google's search cache? Are there any tools or ways to tell if you're being punished for content being posted else on the web even if you originally had attribution? Unrelated: Are there any other steps that are recommend for a Change of site as described above.
Intermediate & Advanced SEO | | PetSite0 -
Pages with Duplicate Page Content (with and without www)
How can we resolve pages with duplicate page content? With and without www?
Intermediate & Advanced SEO | | directiq
Thanks in advance.0 -
PDF for link building - avoiding duplicate content
Hello, We've got an article that we're turning into a PDF. Both the article and the PDF will be on our site. This PDF is a good, thorough piece of content on how to choose a product. We're going to strip out all of the links to our in the article and create this PDF so that it will be good for people to reference and even print. Then we're going to do link building through outreach since people will find the article and PDF useful. My question is, how do I use rel="canonical" to make sure that the article and PDF aren't duplicate content? Thanks.
Intermediate & Advanced SEO | | BobGW0 -
Could you use a robots.txt file to disalow a duplicate content page from being crawled?
A website has duplicate content pages to make it easier for users to find the information from a couple spots in the site navigation. Site owner would like to keep it this way without hurting SEO. I've thought of using the robots.txt file to disallow search engines from crawling one of the pages. Would you think this is a workable/acceptable solution?
Intermediate & Advanced SEO | | gregelwell0 -
Duplicate content issue
Hi I installed a wiki and a forum to subdomains of one of my sites. The crawl report shows me duplicate content on the forum and on wiki. This will hurt the main site? Or the root domain? the site by the way is clean absolutely from errors. Thanks
Intermediate & Advanced SEO | | nyanainc0 -
Duplicate Content, Campaign Explorer & Rel Canonical
Google Advises to use Rel Canonical URL's to advise them which page with similiar information is more relevant. You are supposed to put a rel canonical on the non-preferred pages to point back to the desired page. How do you handle this with a product catalog using ajax, where the additional pages do not exist? An example would be: <colgroup><col width="470"></colgroup>
Intermediate & Advanced SEO | | eric_since1910.com
| .com/productcategory.aspx?page=1 /productcategory.aspx?page=2 /productcategory.aspx?page=3 /productcategory.aspx?page=4 The page=1,2,3 and 4 do not physically exist, they are simply referencing additional products I have rel canonical urls' on the main page www.examplesite.com/productcategory.aspx, but I am not 100% sure this is correct or how else it could be handled. Any Ideas Pro mozzers? |0