Duplicate content issue with trailing / ?
-
Hi ,I did a SEOmoz Crawl Test and found most pages show twice, for example:
A: www.website.com/index.php/dog/walk
B: www.website.com/index.php/dog/walk/
I've checked Google Analytics and 90% of organic search traffic arrives on the URLs with the trailing slash (B).
Question 1: Can I assume I've a duplicate content problem?
Question 2: Is it best to do 301 redirects from the 'non trailing slash' pages to the 'trailing slash pages'?
Question 3: For some reason every web page has a '/index.php' in it (see A&B) above. No idea why. Should it be a SEO concern?
Kind regards and thank you in advance
Nigel
-
Hi Nigel
You only need to 301 one of the pages, 301 is indicating a permanent move, so in the case you outlined above,
I would 301, A to B the decisions to use B was based soly off the value of the url you indicated. If for any reason you prefer the url's not use trailing slash then use A.
It also would not hurt to add a canonical tag to B
To be clear here, whether you use
website.com/index.php/dog/walk
or
website.com/index.php/dog/walk/
Does not matter as far as SEO is concerned, I would make my decision based off of which url has the highest position in Google, and be consistent with this method throughout my site.
Hope that helps,
-
Hi Irving
Thank you for your reply. You mention a good point regarding the sitemap.xml!
If I was to 301redirect pages A & B to a new page eg www.website.com/dog/walk/ then how would I also canonical A & B to the new page?
Surely once I have 301'd the A & B pages will be dead and redirecting traffic to the new page.
Kind regard and my apologies for any confusion.
Nigel
-
Yes, index.php should never show so 301 that plus the trailing slash to remove it
Ddefinitely canonical all of the pages to have the URL without the trailing slash
Make sure your sitemap xml files and internal linking structure does not have the trailing slash. if they do,, then fix them to reflect the proper URL
-
Thank you Highland & Donford.
Re my 3rd question, can I just clarify, should I now 301 redirect both A & B URLs to a new URL say www.website/com/dog/walk ?
Many thanks!
-
Question 1: Can I assume I've a duplicate content problem?
-YesQuestion 2: Is it best to do 301 redirects from the 'non trailing slash' pages to the 'trailing slash pages'?
-Yes 301 is best, barring that use rel="canonical" on the page you want to indexQuestion 3: For some reason every web page has a '/index.php' in it (see A&B) above. No idea why. Should it be a SEO concern?
-Yes, this is a concern, use the same method to deal with the problem. Directories on the server side are usually assumed to have an index, if not the server can choose what to display, this can be very bad sometimes. As such most CMS content management systems fix the problem by generating content for the index.php or .html pages. However, there can be duplicate content issues since there are 2 urls with the same content, use 301 to get rid of the index.php at directory levels, or use canonical tags.
Hope that helps,
Don
-
1. Google can generally tell the difference between pages that have syntactically similar URLs but it's considered a best practice to not make any engine do any guesswork whenever possible.
2. I would 301 one version just for uniformity but you should be fine as-is right now.
3. There's nothing wrong with that being in the URL. Google sees it as part of the URL and nothing more. I don't consider it aesthetic or user friendly but that's a different matter.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Selling same products under separate brands and can't consolidate sites...duplicate content issues?
I have a client selling home goods online and in-store under two different brand names in separate regions of the country. Currently, the websites are completely identical aside from branding. It is unlikely that they would have the capacity to write unique titles and page content for each website (~25,000 pages each), and the business would never consolidate the sites. Would it make sense to use canonical tags pointing to the higher-performing website on category and product pages? This way we could continue to capture branded search to the lesser brand while consolidating authority on the better performing website. What would you do?
Technical SEO | | jluke.fusion0 -
Purchasing duplicate content
Morning all, I have a client who is planning to expand their product range (online dictionary sites) to new markets and are considering the acquisition of data sets from low ranked competitors to supplement their own original data. They are quite large content sets and would mean a very high percentage of the site (hosted on a new sub domain) would be made up of duplicate content. Just to clarify, the competitor's content would stay online as well. I need to lay out the pros and cons of taking this approach so that they can move forward knowing the full facts. As I see it, this approach would mean forgoing ranking for most of the site and would need a heavy dose of original content as well as supplementing the data on page to build around the data. My main concern would be that launching with this level of duplicate data would end up damaging the authority of the site and subsequently the overall domain. I'd love to hear your thoughts!
Technical SEO | | BackPack851 -
Duplicate content /index.php/ issues
I'm having some duplicate content issues with Google. I've already got my .htaccess file working just fine as far as I can tell. Rewriting works great, and by using the site you'd never end up on a page with /index.php. However I do notice that on ANY page of the site you could add /index.php and get the same page i.e.: www.mysite.com/category/article and www.mysite.com/index.php/category/article Would both return the same page. How can I 301 or something similar all /index.php pages to the non index.php version? I have no desire for any page on my site to have index.php in it, there is no use to it. Having quite the hard time figuring this out. Again this is basically just for the robots, the URL's the users see are perfect, never had an issue with that. Just SEOMOZ reporting duplicate content and I've verified that to be true.
Technical SEO | | b18turboef1 -
Development Website Duplicate Content Issue
Hi, We launched a client's website around 7th January 2013 (http://rollerbannerscheap.co.uk), we originally constructed the website on a development domain (http://dev.rollerbannerscheap.co.uk) which was active for around 6-8 months (the dev site was unblocked from search engines for the first 3-4 months, but then blocked again) before we migrated dev --> live. In late Jan 2013 changed the robots.txt file to allow search engines to index the website. A week later I accidentally logged into the DEV website and also changed the robots.txt file to allow the search engines to index it. This obviously caused a duplicate content issue as both sites were identical. I realised what I had done a couple of days later and blocked the dev site from the search engines with the robots.txt file. Most of the pages from the dev site had been de-indexed from Google apart from 3, the home page (dev.rollerbannerscheap.co.uk, and two blog pages). The live site has 184 pages indexed in Google. So I thought the last 3 dev pages would disappear after a few weeks. I checked back late February and the 3 dev site pages were still indexed in Google. I decided to 301 redirect the dev site to the live site to tell Google to rank the live site and to ignore the dev site content. I also checked the robots.txt file on the dev site and this was blocking search engines too. But still the dev site is being found in Google wherever the live site should be found. When I do find the dev site in Google it displays this; Roller Banners Cheap » admin <cite>dev.rollerbannerscheap.co.uk/</cite><a id="srsl_0" class="pplsrsla" tabindex="0" data-ved="0CEQQ5hkwAA" data-url="http://dev.rollerbannerscheap.co.uk/" data-title="Roller Banners Cheap » admin" data-sli="srsl_0" data-ci="srslc_0" data-vli="srslcl_0" data-slg="webres"></a>A description for this result is not available because of this site's robots.txt – learn more.This is really affecting our clients SEO plan and we can't seem to remove the dev site or rank the live site in Google.Please can anyone help?
Technical SEO | | SO_UK0 -
A problem with duplicate content
I'm kind of new at this. My crawl anaylsis says that I have a problem with duplicate content. I set the site up so that web sections appear in a folder with an index page as a landing page for that section. The URL would look like: www.myweb.com/section/index.php The crawl analysis says that both that URL and its root: www.myweb.com/section/ have been indexed. So I appear to have a situation where the page has been indexed twice and is a duplicate of itself. What can I do to remedy this? And, what steps should i take to get the pages re-indexed so that this type of duplication is avoided? I hope this makes sense! Any help gratefully received. Iain
Technical SEO | | iain0 -
Duplicate Footer Content
A client I just took over is having some duplicate content issues. At the top of each page he has about 200 words of unique content. Below this is are three big tables of text that talks about his services, history, etc. This table is pulled into the middle of every page using php. So, he has the exact same three big table of text across every page. What should I do to eliminate the dup content. I thought about removing the script then just rewriting the table of text on every page... Is there a better solution? Any ideas would be greatly appreciated. Thanks!
Technical SEO | | BigStereo0 -
Multiple URLs in CMS - duplicate content issue?
So about a month ago, we finally ported our site over to a content management system called Umbraco. Overall, it's okay, and certainly better than what we had before (i.e. nothing - just static pages). However, I did discover a problem with the URL management within the system. We had a number of pages that existed as follows: sparkenergy.com/state/name However, they exist now within certain folders, like so: sparkenergy.com/about-us/service-map/name So we had an aliasing system set up whereby you could call the URL basically whatever you want, so that allowed us to retain the old URL structure. However, we have found that the alias does not override, but just adds another option to finding a page. Which means the same pages can open under at least two different URLs, such as http://www.sparkenergy.com/state/texas and http://www.sparkenergy.com/about-us/service-map/texas. I've tried pointing to the aliased URL in other parts of the site with the rel canonical tag, without success. How much of a problem is this with respect to duplicate content? Should we bite the bullet, remove the aliased URLs and do 301s to the new folder structure?
Technical SEO | | ufmedia0 -
Crawl issues/ .htacess issues
My site is getting crawl errors inside of google webmaster tools. Google believe a lot of my links point to index.html when they really do not. That is not the problem though, its that google can't give credit for those links to any of my pages. I know I need to create a rule in the .htacess but the last time I did it I got an error. I need some assistance on how to go about doing this, I really don't want to lose the weight of my links. Thanks
Technical SEO | | automart0