Duplicate Content Issue
-
Very strange issue I noticed today. In my SEOMoz Campaigns I noticed thousands of Warnings and Errors!
I noticed that any page on my website ending in .php can be duplicated by adding anything you want to the end of the url, which seems to be causing these issues.
Ex:
Normal URL - www.example.com/testing.php
Duplicate URL - www.example.com/testing.php/helloworld
The duplicate URL displays the page without the images, but all the text and information is present, duplicating the Normal page.
I Also found that many of my PDFs seemed to be getting duplicated burried in directories after directories, which I never ever put in place.
Ex: www.example.com/catalog/pdfs/testing.pdf/pdfs/another.pdf/pdfs/more.pdfs/pdfs/ ...
when the pdfs are only located in a pdfs directory!
I am very confused on how to fix this problem. Maybe with some sort of redirect?
-
Hi Hfranz,
I took a look at your campaign and was unable to duplicate the errors, which leads me to believe it was a blip in the crawling. I'm speculating here, but my suspicion is that there was a miscommunication between your web server and the SEOmoz crawler, the end result being the crawler got sent on a wrong crawl path, and your server kept delivering the bad URLs.
Hopefully, things will return to normal after the next crawl. If not, feel free to contact the help team at help@seomoz.org.
You can also double check with Google Webmaster Tools, to see if there has been an increase in crawl errors or html suggestions.
Finally, your web server should be configured to deliver a 404 for these URLs that don't exist (www.example.com/testing.php/helloworld) Unfortunately, I can't tell you exactly how to do this, and you may need to find a developer with expertise in this area - but my guess is very little, or no, damage has been done.
-
to me it seems that there is more an issue with the php script.
- is helloworld the testing.php?
the pdf issue could be a referencing issue within your script, even if the pdf's are loaded into the correct folder. check the upload function, especially where you give the doc the URI!
-
Read this
All you really need is a canonical code.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content analysis
Hi all,We have some pages being flagged as duplicates by the google search console. However, we believe the content on these pages is distinctly different (for example, they have completely different search results returned, different headings etc). An example of two pages google finds to be duplicates is below. if anyone can spot what might be causing the duplicate issue here, would very much appreciate suggestions! Thanks in advance.Â
Technical SEO | | Eric_S
Examples:Â https://www.vouchedfor.co.uk/IFA-financial-advisor-mortgage/harborne
https://www.vouchedfor.co.uk/accountant/harborne0 -
Duplicate content. Wordpress and Website
Hi All, Will Google punish me for having duplicate blog posts on my website's blog and wordpress? Thanks
Technical SEO | | Mike.NW0 -
Advice on Duplicate Page Content
We have many pages on our website and they all have the same template (we use a CMS) and at the code level, they are 90% the same. But the page content, title, meta description, and image used are different for all of them. For example - http://www.jumpstart.com/common/find-easter-eggs
Technical SEO | | jsmoz
http://www.jumpstart.com/common/recognize-the-rs We have many such pages. Does Google look at them all as duplicate page content? If yes, how do we deal with this?0 -
Duplicate Content based on www.www
In trying to knock down the most common errors on our site, we've noticed we do have an issue with dupicate content; however, most of the duplicate content errors are due to our site being indexed with www.www and not just www. I am perplexed as to how this is happening.  Searching through IIS, I see nothing that would be causing this, and we have no hostname records setup that are www.www. Does anyone know of  any other things that may cause this and how we can go about remedying it?
Technical SEO | | CredA0 -
Link Structure & Duplicate Content
I am struggling with how I should handle the link structure on my site. Right now most of my pages are like this: Home -> Department -> Service Groups -> Content Page For Example: Home -> IT Solutions -> IT Support & Managed Services -> IT Support Home -> IT Solutions -> IT Support & Managed Services -> Managed Services Home -> IT Solutions -> IT Support & Managed Services -> Help Desk Services Home -> IT Solutions -> Virtualization & Data Center Solutions -> Virtualization Home -> IT Solutions -> Virtualization & Data Center Solutions -> Data Center Solutions This structure lines up with our business and makes logical sense but I am not sure how to handle the department and service group pages. Right now you can click them and it just brings you to a page with a small snippet for the links below. The real content is on the content pages. What I am worried about is that the snippets on those pages are just a paragraph or two of the content that's on the content page. Will this hurt me and get considered duplicate content? What is the best practice for dealing with this? Those department/service group pages have some good content on them but it's just parts of other pages. Am I okay doing this because there are not direct duplicates of other pages just parts of a few pages? Any help on this would be great. Thanks in advance.
Technical SEO | | ZiaTG0 -
How to get rid of duplicate content
I have duplicate content that looks like http://deceptionbytes.com/component/mailto/?tmpl=component&link=932fea0640143bf08fe157d3570792a56dcc1284 - however I have 50 of these all with different numbers on the end. Does this affect the search engine optimization and how can I disallow this in my robots.txt file?
Technical SEO | | Mishelm1 -
Block Quotes and Citations for duplicate content
I've been reading about the proper use for block quotes and citations lately, and wanted to see if I was interpreting it the right way. This is what I read: http://www.pitstopmedia.com/sem/blockquote-cite-q-tags-seo So basically my question is, if I wanted to reference Amazon or another stores product reviews, could I use the block quote and citation tags around their content so it doesn't look like duplicate content? I think it would be great for my visitors, but also to the source as I am giving them credit. It would also be a good source to link to on my products pages, as I am not competing with the manufacturer for sales. I could also do this for product information right from the manufacturer. I want to do this for a contact lens site. I'd like to use Acuvue's reviews from their website, as well as some of their product descriptions. Of course I have my own user reviews and content for each product on my website, but I think some official copy could do well. Would this be the best method? Is this how Rottentomatoes.com does it? On every movie page they have 2-3 sentences from 50 or so reviews, and not much unique content of their own. Cheers, Vinnie
Technical SEO | | vforvinnie1 -
Duplicate content issues caused by our CMS
Hello fellow mozzers, Our in-house CMS - which is usually good for SEO purposes as it allows all the control over directories, filenames, browser titles etc that prevent unwieldy / meaningless URLs and generic title tags - seems to have got itself into a bit of a tiz when it comes to one of our clients. We have tried solving the problem to no avail, so I thought I'd throw it open and see if anyone has a soultion, or whether it's just a fault in our CMS. Basically, the SEs are indexing two identical pages, one ending with a / and the other ending /index.php, for one of our sites (www.signature-care-homes.co.uk). We have gone through the site and made sure the links all point to just one of these, and have done the same for off-site links, but there is still the duplicate content issue of both versions getting indexed. We also set up an htaccess file to redirect to the chosen version, but to no avail, and we're not sure canonical will work for this issue as / pages should redirect to /index.php anyway - and that's we can't work out. We have set the access file to point to index.php, and that should be what should be happening anyway, but it isn't. Is there an alternative way of telling the SE's to only look at one of these two versions? Also, we are currently rewriting the content and changing the structure - will this change the situation we find ourselves in?
Technical SEO | | themegroup0