Duplicate content issue
-
Hi everyone,
I have an issue determining what type of duplicate content I have.
www.example.com/index.php?mact=Calendar,m57663,default,1&m57663return_id=116&m57663detailpage=&m57663year=2011&m57663month=6&m57663day=19&m57663display=list&m57663return_link=1&m57663detail=1&m57663lang=en_GB&m57663returnid=116&page=116
Since I am not an coding expert, to me it looks like it is a URL parameter duplicate content. Is it?
At the same time "return_id" would makes me think it is a session id duplicate content. I am confused about how to determine different types of duplicate content, even by reading articles on Seomoz about it: http://www.seomoz.org/learn-seo/duplicate-content.
Could someone help me on how to recognize different types of duplicate content?
Thank you!
-
Thank you guys for being so helpful!!:)
-
Hello Jeff, I would like to say first that lots of sites have duplicate content problems. For the most part, this is not a huge issue. When search engines find duplicate content they choose one of the pages to list in the index, and then will ignore the other. This assumes, of course, that the nature of the duplicate content is not so bad that it would lead to the search engine wanting to ban you. This can happen if a review of your situation causes them to believe that you are deliberately trying to rank multiple times for the same search terms.
Here is a link that fixes the problem of duplicate content :
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
-
Let me try.
1. The answer to your first question is that it only matters if you're trying to figure out how to handle it programmaticaly. In this case you might have to ask the developer if this is being done by a session id. To me it looks more like a URL parameter, but without a live example I wouldnt know, could you provide the website in question? If not try visiting the website once, clear your cache and then visit again and see if the number after "return_id" changes. if it changes that is a session id. If it stays the same have a friend visit the website in the same manor and see if the number stays the same, if it changes then there's a good chance that this is a session id.
No matter if it's a session id adding it or not "return_id" is technically a URL parameter that is triggered by a session id.
2. The second question is still a bit vague, so let me see if this is correct. are you asking how to treat the duplicate content once you know what is causing it? If so, then follow these rules.
If the content changes significantly in the presence of the session id or parameter then this is not duplicate content. If the content does change do the following:
- make sure to use rel canonical for the root URL. In your example that would be: www.example.com/index.php?mact=Calendar
- set the URL parameters in Google and Bings webmaster tools to treat the parameter correctly.
- When the parameter or session id is present add the noindex, follow robots tag. this will allow the bots to spider through and pass on link juice in the event that someone links to your parameter versions
I think you have a larger issue, which is that your website's code is using the index.php to generate all of the pages, in the example that is calendar. This is a common mistake that programmers make since they work to do things as quickly and efficiently as possible. Its far easier to keep all of the code in the one file than to create several different dynamic files that work with each other.
If you dont have the ability to break this down and generate out different pages you might be able to use URL Rewrites to make browsers and bots think the URLs are actually different.
-
Thank you for your answers but I guess I didn't formulate properly my question.
My 1st question was: What kind of duplicate content is it?
- session id
- or url parameter
My second question is: How do you differentiate them? What do you look at when a duplicate content is a session id one or a url parameter issue?
-
You can determine if you have duplicate content several ways. search in google site:example.com and see how many pages google knows at your website. Also, when you are on page with this crazy url, open source code and see if a page has rel="canonical" tag. In your page that would be the best solution to signal robot that this is the same page as your index.php page.
Also, you can try Xenu. good and fast program to run your site on duplicates.
Hope it helps, you can show your website so we can take a look.
-
Hi Jeff,
index.php is the same as index.php?something=something&anotherthing=somethinglese
Each page should have a different url like index.php and page.php instead of always using index.php
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content, although page has "noindex"
Hello, I had an issue with some pages being listed as duplicate content in my weekly Moz report. I've since discussed it with my web dev team and we decided to stop the pages from being crawled. The web dev team added this coding to the pages <meta name='robots' content='max-image-preview:large, noindex dofollow' />, but the Moz report is still reporting the pages as duplicate content. Note from the developer "So as far as I can see we've added robots to prevent the issue but maybe there is some subtle change that's needed here. You could check in Google Search Console to see how its seeing this content or you could ask Moz why they are still reporting this and see if we've missed something?" Any help much appreciated!
Technical SEO | | rj_dale0 -
Shopify Duplicate Content in products
Hello Moz Community, New to Moz and looking forward to beginning my journey towards SEO education and improving our clients' sites. Our client's website is a Shopify store. https://spiritsofthewestcoast.com/ Our first Moz reports show 686 duplicate content issues. I will show the first 4 as examples. https://spiritsofthewestcoast.com/collections/native-earrings-and-studs-in-silver-and-gold/products/haida-eagle-teardrop-earrings https://spiritsofthewestcoast.com/collections/native-earrings-and-studs-in-silver-and-gold/products/haida-orca-silver-earrings https://spiritsofthewestcoast.com/collections/native-earrings-and-studs-in-silver-and-gold/products/silver-oval-earrings https://spiritsofthewestcoast.com/collections/native-earrings-and-studs-in-silver-and-gold/products/haida-eagle-spirit-silver-earrings As you can see, URL titles are unique. But I know that the content in each of those products have very similar product descriptions but not exactly. But since they have been flagged as a site issue by Moz, I am guessing that the content is 95% duplicate. So can a rel=canonical be the right solution for this type of duplicate content? Or should I be considering adding new content to each of 686 products to drop below the 95% threshold? Or another solution that I may not be aware of. Thanks in advance for your assistance and expertise! Sean
Technical SEO | | TheUpdateCompany1 -
Duplicate Content Brainstorming
Hi, New here in the SEO world. Excellent resources here. We have an ecommerce website that sells presentation templates. Today our templates come in 3 flavours - for PowerPoint, for Keynote and both - called Presentation Templates. So we've ended up with 3 URLS with similar content. Same screenshots, similar description.. Example: https://www.improvepresentation.com/keynote-templates/social-media-keynote-template https://www.improvepresentation.com/powerpoint-templates/social-media-powerpoint-template https://www.improvepresentation.com/presentation-templates/social-media-presentation-template I know what you're thinking. Why not make a website with a template and give 3 download options right? But what about https://www.improvepresentation.com/powerpoint-templates/ https://www.improvepresentation.com/keynote-templates/ These are powerfull URL's in my opinion taking into account that the strongest keyword in our field is "powerpoint templates" How would you solve this "problem" or maybe there is no problem at all.
Technical SEO | | slidescamp0 -
Duplicate content in product listing
We have "duplicate content" warning in our moz report which mostly revolve around our product listing (eCommerce site) where various filters return 0 results (and hence show the same content on the page). Do you think those need to be addressed, and if so how would you prevent product listing filters that appearing as duplicate content pages? should we use rel=canonical or actually change the content on the page?
Technical SEO | | erangalp0 -
Duplicate content - Quickest way to recover?
We've recently been approached by a new client who's had a 60%+ drop in organic traffic. One of the major issues we found was around 60k+ pages of content duplicated across 3 seperate domains. After much discussion and negotiation with them; we 301'd all the pages across to the best domain but traffic is increasing very slowly. Given that the old sites are 60k+ pages each and don't get crawled very often, is it best to notify the domain change through Google Webmaster Tools to try and give Google a 'nudge' to deindex the old pages and hopefully recover from the traffic loss as quickly and as much as possible?
Technical SEO | | Nathan.Smith0 -
Duplicate Content Issue: Google/Moz Crawler recognize Chinese?
Hi! I am using Wordpress multisite and my Chinese version of the website is in www.mysite.com/cn Problem: I keep getting duplicate content errors within www.mysite.com/cn (NOT between www.mysite.com and www.mysite.com/cn) I have downloaded and checked the SEOmoz report and duplicate_page_content list in CSV file. I have no idea why it says they have the same content., they have nothing in common in content . www.mysite.com is the English version of the website,and the structure is the same for www.mysite.com/cn *I don't have any duplicate content issues within www.mysite.com Question: Does google Crawler properly recognizes chinese content??
Technical SEO | | joony20080 -
Strange duplicate content issue
Hi there, SEOmoz crawler has identified a set of duplicate content that we are struggling to resolve. For example, the crawler picked up that this page www. creative - choices.co.uk/industry-insight/article/Advice-for-a-freelance-career is a duplicate of this page www. creative - choices.co.uk/develop-your-career/article/Advice-for-a-freelance-career. The latter page's content is the original and can be found in the CMS admin area whilst the former page is the duplicate and has no entry in the CMS. So we don't know where to begin if the "duplicate" page doesn't exist in the CMS. The crawler states that this page www. creative-choices.co.uk/industry-insight/inside/creative-writing is the referrer page. Looking at it, only the original page's link is showing on the referrer page, so how did the crawler get to the duplicate page?
Technical SEO | | CreativeChoices0 -
Canonical Link for Duplicate Content
A client of ours uses some unique keyword tracking for their landing pages where they append certain metrics in a query string, and pulls that information out dynamically to learn more about their traffic (kind of like Google's UTM tracking). Non-the-less these query strings are now being indexed as separate pages in Google and Yahoo and are being flagged as duplicate content/title tags by the SEOmoz tools. For example: Base Page: www.domain.com/page.html
Technical SEO | | kchandler
Tracking: www.domain.com/page.html?keyword=keyword#source=source Now both of these are being indexed even though it is only one page. So i suggested placing an canonical link tag in the header point back to the base page to start discrediting the tracking URLs: But this means that the base pages will be pointing to themselves as well, would that be an issue? Is their a better way to solve this issue without removing the query tracking all togther? Thanks - Kyle Chandler0