Duplicate Page Titles and Content
-
The SeoMoz crawler has found many pages like this on my site with /?Letter=Letter, e.g. http://www.johnsearles.com/metal-art-tiles/?D=A. I believe it is finding multiple caches of a page and identifying them as duplicates. Is there any way to screen out these multiple cache results?
-
I think I figured out what to add to Robots.txt to screen out any url with an '?' in it. I believe these ?urls are session IDs for Urls. I'll see what Roger-bot does next time it crawls my site.
Disallow: /*?
-
Hey John,
My apologies for any issues that you are experiencing with our service. I would definitely like to address any other issues, besides this one, that you may be experiencing. You could either respond to this Q&A thread or submit a private customer support ticket to our help team. If you go to our help hub (www.seomoz.org/help) you can easily submit a ticket by clicking the contact help team button.
As for your duplicate content question, it is important to know that any time the same content is found on more than one URL that it is considered duplicate content. WordPress is a good example where duplicate is often found but can be easily addressed.
In WordPress you could have your homepage www.domain.com and an author page www.domain.com/author/authorname. If your blog only has one author though this author page is going to be identical to your homepage and the result is your site having duplicate content. There are a few ways to resolve this though with the most popular being simply preventing access to the author page and redirecting it back to the homepage. This would prevent other sites from linking to these duplicate pages and they would instead link directly to the homepage.
Another option would be to use meta robots noindex and follow tags on the duplicate page, in this case the author page. This would prevent the page from being indexed but will still allow the links on the page to be found and crawled. You can also prevent access to these pages in your robots.txt file and our crawler can be isolated by using the user-agent rogerbot.
I hope that makes sense.
Let me know if you have any additional questions or concerns.
Kenny
-
Thanks Guy. I was thinking of subscribing to SeoMoz but the site reports have been less than useful. This is just one of 5 issues I've found.
-
So far no. Until they fix that little error you can use Google Webmaster Tool's to double check for real duplicate content.
The spider is seeing whatever.php?var=1 as a different page because some sites just use index.php?p=103 to be a page and p=102 another page. While others use the variables in the URL on the same page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Title tags drawn from breadcrumbs
My client has a magento site that we've recently started working on. After the site was crawled by Moz a couple of times we noticed there was an issue with title tags being too long. However, when we looked closer at the data, Moz was seemingly picking up the breadcrumbs as the title tag. The actual tag is a small part of the breadcrumbs, but Moz was reporting they were the same. For example:
Moz Pro | | Stone_Junction
Breadcrumbs - home>products>category A of products
Title Tag - category A of products I just wondered if anyone else has had this problem? Is it Moz's mistake or is the title tag auto generating from the breadcrumbs and cutting off the beginning somehow? Any information would be really helpful, thanks.0 -
Codeigniter - Controller and duplicate pages
Hi there, I use Codeigniter as framework and I have a question about the duplicate page. Actually, for default, the typical page in a CodeIgniter framework is something like this: http://www.domain.com/site/contact where site is the controller containing the contact function that point to the contact.html view... To have a better URL I use a trick with the "routes" that redirect any http://www.domain.com/contact to the original http://www.domain.com/site/contact Of course the both are valid and the both are... crawled! So I get the duplicate page. Is this something I have to manage, maybe with .htaccess? Any idea would be very appreciated. Thanks for you precious time guys! Shella
Moz Pro | | CarloShellaMascella0 -
Since July 1, we've had a HUGE jump in errors on our weekly crawl. We don't think anything has changed on our website. Has MOZ changed something that would account for a large leap in duplicate content and duplicate title errors?
Our error report went from 1,900 to 18,000 in one swoop, starting right around the first of July. The errors are duplicate content and duplicate title, as if it does not see our 301 redirects. Any insights?
Moz Pro | | KristyFord0 -
Pages Crawled: 1 Why?
I have some campaigns which have only 1 page crawled, while some other campaigns, having completely similar URL (subdomain) and number of keywords and pages, have all pages crawled... Why is that so? It has been also a while I waited and so far no change...
Moz Pro | | BritishCouncil0 -
Changing the way SEOmoz Detects Duplicate Content
Hey everyone, I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post: 1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that: **We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported. **Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported. 2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
Moz Pro | | KeriMorgret2 -
Duplicate page title
I own a store www.mzube.co.uk and the scam always says that I have duplicate page titles or duplicate page. What happens is thn I may have for example www.mzube.co.uk/allproducts/page1. And if I hve 20 pages all what will change from each page is the number at the end and all the rest of the page name will be the same but really the pages are if different products. So the scans think I have 20 pages the same but I havent Is this a concern as I don't think I can avoid this Hope you can answer
Moz Pro | | mzube0 -
Dynamic URL pages in Crawl Diagnostics
The crawl diagnostic has found errors for pages that do not exist within the site. These pages do not appear in the SERPs and are seemingly dynamic URL pages. Most of the URLs that appear are formatted http://mysite.com/keyword,%20_keyword_,%20key_word_/ which appear as dynamic URLs for potential search phrases within the site. The other popular variety among these pages have a URL format of http://mysite.com/tag/keyword/filename.xml?sort=filter which are only generated by a filter utility on the site. These pages comprise about 90% of 401 errors, duplicate page content/title, overly-dynamic URL, missing meta decription tag, etc. Many of the same pages appear for multiple errors/warnings/notices categories. So, why are these pages being received into the crawl test? and how to I stop it to gauge for a better analysis of my site via SEOmoz?
Moz Pro | | Visually0 -
Duplicate page error from SEOmoz
SEOmoz's Crawl Diagnostics is complaining about a duplicate page error. I'm trying to use a rel=canonical but maybe I'm not doing it right. This page is the original, definitive version of the content: https://www.borntosell.com/covered-call-newsletter/sent-2011-10-01 This page is an alias that points to it (each month the alias is changed to point to the then current issue): https://www.borntosell.com/covered-call-newsletter/latest-issue The alias page above contains this tag (which is also updated each month when a new issue comes out) in the section: Is that not correct? Is the https (vs http) messing something up? Thanks!
Moz Pro | | scanlin0