Duplicate content warning: Same page but different urls???
-
Hi guys i have a friend of mine who has a site i noticed once tested with moz that there are 80 duplicate content warnings, for instance
Page 1 is http://yourdigitalfile.com/signing-documents.html
the warning page is http://www.yourdigitalfile.com/signing-documents.html
another example
Page 1 http://www.yourdigitalfile.com/
same second page http://yourdigitalfile.com
i noticed that the whole website is like the nealry every page has another version in a different url?, any ideas why they dev would do this, also the pages that have received the warnings are not redirected to the newer pages you can go to either one???
thanks very much
-
Thanks Tim. Do you have any examples of what those problems might be? With such a large catalog managing those rel canonical tags will be difficult (I don't even know if the store allows them, it's a hosted store solution and little code customization is allowed).
-
Hi there AspenFasteners, in this instance rather than a .HTAccess rule I would suggest applying a rel canonical tag which points to the page you deem as the original master source.
Using the robots to try and hide things could potentially cause you more issues as your categories may struggle to be indexed correctly.
-
We have a similar problem, but much more complex to handle as we have a massive catalog of 80,000 products and growing.
The problem occurs legitimately because our catalog is so large that we offer different navigation paths to the same content.
http://www.aspenfasteners.com/Self-Tapping-Sheet-Metal-s/8314.htm
http://www.aspenfasteners.com/Self-Tapping-Sheet-Metal-s/8315.htm
(If you look at the "You are here" breadcrumb trail, you will see the subtle differences in the navigation paths, with 8314.htm, the user went through Home > Screws, with 8315.htm, via Home > Security Fasteners > Screws).
Our hosted web store does not offer us htaccess, so I am thinking of excluding the redundant navigation points via robots.txt.
My question: is there any reason NOT to do this?
-
Oh ok
The only reason i was thinking it is duplicate content is the warnings i got on the moz crawl, see below.
75 Duplicate Page Content
6 4xx Client Error
5 Duplicate Page Title
44 Missing Meta Description Tag
5 Title Element is Too Short
I have found over 80 typos, grammatical errors, punctuation errors and incorrect information which was leading me to believe the quality of the work and their attention to detail was rather bad, which is why i thought this was a possibility.
Thanks again for your time its really appreciated
-
I wouldn't say that they have created two pages, it is just that because you have two versions of the domain and not set a preferred version that you are getting it indexing twice. .HTaccess changes are under the hood of the website and could have simply been an oversight.
-
Hey Tim
Thanks for your answer. It's really weird, other than lazyness on the devs part not to remove old or previous versions of pages?, have you any idea why they would create multiple versions of the same page with different url's?? is there any legit reason like ones severs mobile or something??
Just wondering thanks for replying
-
OK, so in this instance the only issue you have is that you need to choose your preferred start point - www or non www.
I would add a bit of code to your htaccess file to point to your preferred choice. I personally prefer a www. domain. Something like the below would work.
RewriteCond %{HTTP_HOST} ^example.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]As your site is already indexed I would also for the time being and as more of a safety measure add canonicals to the pages that point to the www. version of your site.
Also if you have a Google Search Console account, you can select your prefered domain prefix in there. this will again help with your indexation.
Hopefully I have covered most things.
Cheers
Tim
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Would this be duplicate content or bad SEO?
Hi Guys, We have a blog for our e-commerce store. We have a full-time in-house writer producing content. As part of our process, we do content briefs, and as part of the brief we analyze competing pieces of content existing on the web. Most of the time, the sources are large publications (i.e HGTV, elledecor, apartmenttherapy, Housebeautiful, NY Times, etc.). The analysis is basically a summary/breakdown of the article, and is sometimes 2-3 paragraphs long for longer pieces of content. The competing content analysis is used to create an outline of our article, and incorporates most important details/facts from competing pieces, but not all. Most of our articles run 1500-3000 words. Here are the questions: Would it be considered duplicate content, or bad SEO practice, if we list sources/links we used at the bottom of our blog post, with the summary from our content brief? Could this be beneficial as far as SEO? If we do this, should be nofollow the links, or use regular dofollow links? For example: For your convenience, here are some articles we found helpful, along with brief summaries: <summary>I want to use as much of the content that we have spent time on. TIA</summary>
White Hat / Black Hat SEO | | kekepeche1 -
"Google chose different canonical than user" Issue Can Anyone help?
Our site https://www.travelyaari.com/ , some page are showing this error ("Google chose different canonical than user") on google webmasters. status message "Excluded from search results". Affected on our route page urls mainly. https://www.travelyaari.com/popular-routes-listing Our canonical tags are fine, rel alternate tags are fine. Can anyone help us regarding why it is happening?
White Hat / Black Hat SEO | | RobinJA0 -
Duplicate content site not penalized
Was reviewing a site, www.adspecialtyproductscatalog.com, and noted that even though there are over 50,000 total issues found by automated crawls, including 3000 pages with duplicate titles and 6,000 with duplicate content this site still ranks high for primary keywords. The same essay's worth of content is pasted at the bottom of every single page. What gives, Google?
White Hat / Black Hat SEO | | KenSchaefer0 -
Duplicate categories how to make sure I don't get penalized for this
Hi there How would I go about fixing duplicate categories? My products sell in multiple category areas and some overlap the other - how can I go about making sure that I don't get penalised for this? Each category and content is unique but my advisors offer different tools and insights.
White Hat / Black Hat SEO | | edward-may0 -
Preventing CNAME Site Duplications
Hello fellow mozzers! Let me see if I can explain this properly. First, our server admin is out of contact at the moment,
White Hat / Black Hat SEO | | David-Kley
so we are having to take this project on somewhat blind. (forgive the ignorance of terms). We have a client that needs a cname record setup, as they need a sales.DOMAIN.com to go to a different
provider of data. They have a "store" platform that is hosted elsewhere and they require a cname to be
sent to a custom subdomain they set up on their end. My question is, how do we prevent the cname from being indexed along with the main domain? If we
process a redirect for the subdomain, then the site will not be able to go out and grab the other providers
info and display it. Currently, if you type in the sales.DOMAIN.com it shows the main site's homepage.
That cannot be allow to take place as we all know, having more than one domain with
exact same content = very bad for seo. I'd rather not rely on Google to figure it out. Should we just have the cname host (where its pointing at) add a robots rule and have it set to not index
the cname? The store does not need to be indexed, as the items are changed almost daily. Lastly, is an A record required for this type of situation in any way? Forgive my ignorance of subdomains, cname records and related terms. Our server admin being
unavailable is not helping this project move along any. Any advice on the best way to handle
this would be very helpful!0 -
How to resolve - Googlebot found an extremely high number of URLs
Hi, We got this message from Google Webmaster “Googlebot found an extremely high number of URLs on your site”. The sample URLs provided by Google are all either noindex or have a canonical. http://www.myntra.com/nike-stylish-show-caps-sweaters http://www.myntra.com/backpacks/f-gear/f-gear-unisex-black-&-purple-calvin-backpack/162453/buy?src=tn&nav_id=541 http://www.myntra.com/kurtas/alma/alma-women-blue-floral-printed-kurta/85178/buy?nav_id=625 Also we have specified the parameters on these URLs as representative URL in Google Webmaster - URL parameters. Your comments on how to resolve this issue will be appreciated. Thank You Kaushal Thakkar
White Hat / Black Hat SEO | | Myntra0 -
All pages going through 302 redirect - bad?
So, our web development company did something I don't agree with and I need a second opinion. Most of our pages are statically cached (the CMS creates .html files), which is required because of our traffic volume. To get geotargeting to work, they've set up every page to 302 redirect to a geodetection script, and back to the geotargeted version of the page. Eg: www.example.com/category 302 redirects to www.example.com/geodetect.hp?ip=ip_address. Then that page 302 redirects back to either www.example.com/category, or www.example.com/geo/category for the geo-targeted version. **So all of our pages - thousands - go through a double 302 redirect. It's fairly invisible to the user, and 302 is more appropriate than 301 in this case, but it really worries me. I've done lots of research and can't find anything specifically saying this is bad, but I can't imagine Google being happy with this. ** Thoughts? Is this bad for SEO? Is there a better way (keeping in mind all of our files are statically generated)? Is this perfectly fine?
White Hat / Black Hat SEO | | dholowiski0