Why does my crawl diagnostics show duplicate content
-
My crawl diagnostics show duplicate content at mysite.com and mysite.com/index.html which are essentially the same file.
-
Michel is right - Google doesn't care that they're one template - if both URLs are being crawled, then they'll see that as two "pages". Every unique, crawlable URL can become an indexed page. That's why duplicate content problems are so common.
The good news is that you can put a canonical tag on just the one template/file and it will cover all of the paths/URLs that land on that file. The tag goes in your section and looks like:
I'd check the internal links, though, and see if you're linking to both versions. It's best to use one, consistent URL in your internal links for any given page.
-
mysite.com is a domain not a file with mysite.com/index.html being the home page. Not sure how I would do what you suggest.
-
If the crawl report found those two URLs, then your website has at least one link to each of those URLs (otherwise Rogerbot wouldn't have found them).
You should follow Collin's advice to define the canonical page.
It also won't hurt to figure out where those links are being used in your content, and then make sure you only use one to point to your page.
Cheers
Michel
-
"Essentially" the same file isn't the same as "the same file." Your best bet is probably to mark one of them (probably mysite.com) with rel=canonical.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Diagnostics: Next crawl date is in the past
Hi - I have quite a few crawl diagnostic errors and warnings. I have attempted to fix many of them but noticed this note at the bottom of the crawl diagnostics chart: "Last Crawl Completed: Mar. 22nd, 2013 Next Crawl Starts: Mar. 29th, 2013" It looks like SEOMoz thinks the next crawl date is Mar 29th, 2013, which is two weeks ago. Is there any way to "force" the crawl and get it back on regular schedule? This may have happened when my account was disabled because my credit card expired...Thoughts?
Moz Pro | | 6thirty0 -
Campaign. Only 1 page is crawled
I have a campaign setup a couple weeks ago and noticed that only 1 page has been crawled. Is there something I need to do to get all pages crawled?
Moz Pro | | priceseo0 -
No Crawl data in dashboard
For the second straight week, I have had no crawl data in my dashboard. It seems like the crawler erased all my results in the pro dashboard. Is there a way to manually recrawl my site, since I will have to wait another week to see if it comes back to earth? Thanks
Moz Pro | | bedwards0 -
Changing the way SEOmoz Detects Duplicate Content
Hey everyone, I wanted to highlight today's blog post in case you missed it. In short, we're using a different algorithm to detect duplicate pages. http://moz.com/blog/visualizing-duplicate-web-pages If you see a change in your crawl results and you haven't done anything, this is probably why. Here's more information taken directly from the post: 1. Fewer duplicate page errors: a general decrease in the number of reported duplicate page errors. However, it bears pointing out that: **We may still miss some near-duplicates. **Like the current heuristic, only a subset of the near-duplicate pages is reported. **Completely identical pages will still be reported. **Two pages that are completely identical will have the same simhash value, and thus a difference of zero as measured by the simhash heuristic. So, all completely identical pages will still be reported. 2. Speed, speed, speed: The simhash heuristic detects duplicates and near-duplicates approximately 30 times faster than the legacy fingerprints code. This means that soon, no crawl will spend more than a day working its way through post-crawl processing, which will facilitate significantly faster delivery of results for large crawls.
Moz Pro | | KeriMorgret2 -
Error on duplicated content, but when checking shouldn't been possible
Dear all, Every week I look at the different crawl reports for our website, since the start of my SeoMoz membership the Errors for duplicated content and duplicated Title is rising. But if I take out the .csv file and look in more detail, and select a pages which is marked as duplicated content, a canonical is actually existing on this page. So it shouldn't be an warning, I have no idea what the issue could be. For example pagesare marked as duplicated content, <colgroup><col width="966"></colgroup>
Moz Pro | | Letty
| http://www.zylom.com/es/descargar-juegos/3-en-raya/?sortby=2 |
| http://www.zylom.com/es/descargar-juegos/3-en-raya/?startnumber=60&sortby=2 |
| http://www.zylom.com/es/descargar-juegos/3-en-raya/?startnumber=80&sortby=2 | the parameters after '?' (question mark) are necessary for our internal system. To overcome duplicated content we coded that a canonical tag onis placed on every page with parameters and the main page is http://www.zylom.com/es/descargar-juegos/3-en-raya/ but it doesn't seem to work, because my error warnings are still rising. Please advice me Kind regards, Ms Letty van Eembergen0 -
SEOMOZ Crawling Our Site
Hi there, We get a report from SEOMOZ every week which shows our performance within search. I noticed for our website www.unifor.com.au that it looks through over 10,000 pages, however our website sells less than 500 products so not sure why or how so many pages are trawled? If someone could let me know that would be great. It uses up a lot of bandwidth doing each of these searches so if the amount of pages being trawled reduced it would definitely assist. Thanks, Geoff
Moz Pro | | BeerCartel750 -
Why aren't canonical tags reducing duplicate page title/content?
We have canonical tags set up for a feature page on one of our sites. This site has an image gallery controlled by javascript. To aid the user experience the image can also be specified by a URL parameter (the javascript also uses this URL to fetch the images). The SEOMoz report complains that the links to these images have duplicate page titles and content. To try and combat this we set canonical tags to point only to the original page, without the slideshow parameter. e.g. http://www.example.com/feature-page/ http://www.example.com/feature-page/?slideshow=1 -> canonical tag set to http://www.example.com/feature-page/ http://www.example.com/feature-page/?slideshow=2 -> canonical tag set to http://www.example.com/feature-page/ The latest SEOMoz report has come back and the errors still exist. What can we do to remove these error messages? Thanks
Moz Pro | | TJSSEO1 -
How do you get Mozbot to crawl your website
I trying to get the mozbot to crawl my site so I can get new crawl diagnostics info. Anyone know how this can be done?
Moz Pro | | Romancing0