Long list of companies spread out over several pages - duplicate content?
-
Hi all,
I am currently working with a company formation agent. They have a list of every limited company spread over hundreds of pages. What do you guys think? Is there a need for Canonicals? The website is ranking pretty well but I want to make sure there aren't any problems in the future.
Here are two pages as examples:
http://www.formationsdirect.com/companysearchlist.aspx?start=MULLAGHBOY+CONSTRUCTION+LIMITED&next=1#
http://www.formationsdirect.com/companysearchlist.aspx?start=%40a+company+limited&next=1#
Also what about the actual company pages? See an example below
Thanks in advance
Aaron
-
Thanks George,
I'll think I'll take your advice and hold off for now.
Aaron
-
Hi Aaron,
First off, since your rankings haven't been affected I would definitely hold off changing anything in WMT unless you're sure as it might cause more harm than good. If you paginate what looks like potentially thousands of pages I'm not convince Google will look on this fondly. The URLs will probably also change regularly as more companies are incorporated because the pages are set to show fixed list lengths.
Resolving the duplicate content onsite is definitely the best course of action. The fact that Moz is crawling these duplicate pages indicates that it's picking up links from somewhere on your site. If you are able to stop exposing these links and only linking to the "preferred version" i.e. canonical then this will give you some control and a better understanding of the site's information architecture.
Regarding setting up of canonicals, I suspect that this will be a harder job as of the 3 duplicate URLs you provide, it's not immediately clear which one would be the canonical. There are probably also thousands of instances similar to this duplicate group across other company lists and Google will have picked at random which one it sees as the canonical on each one. Marking another URL in the group as the canonical stands to (at least temporarily) cause a drop in rankings and SEO visibility if done across thousands of pages simultaneously.
If I was you and I felt compelled to address the issue I would pick a sample ~10% of the duplicate groups, set a canonical on each of them and see what happens in terms of rankings over 3-6 weeks. I would also add the canonicals to a sitemap and try update any links on your website to make sure only the canonical is referenced.
It's risky though, as your rankings are good even though I understand the principle of what you're trying to achieve. When I've tended to do things like this it's when a website has had nothing to lose.
George
-
Hi George,
Thanks for your clear answer.
The reason I am worried is that MOZ is flagging up thousands of these links as duplicate. Looking at it again today I noticed that it is mainly the list pages that are duplicates. EG
http://www.formationsdirect.com/companysearchlist.aspx?start=%40a+company+limited&next=1
http://www.formationsdirect.com/companysearchlist.aspx?start=AAA+AUTOMOTIVE+LTD&back=1
http://www.formationsdirect.com/companysearchlist.aspx?start=A+LIMITED&next=1
These 3 bring up exactly the same page and it seems that every page in the list has 3 or 4 of these variations.
I did a check in WT and it seems that the 'companysearchlist' parameter has been listed but it is not actually affecting any URLs. Would changing the status to 'pagination' help with this? I imagine that it would be then completely ignored by Google. Or would it better to make a canonical for each duplicate issue so each page gets in once?
PS I left the '#' in the last URL by mistake. It is just a tracking parameter that is being used by the company.
Aaron
-
Hi Aaron,
The search experience on the website is a bit unconventional in that you search for a company name and it returns pages of results alphabetically listed with the name you are searching for hopefully in there somewhere!
You could make changes to the pagination using rel=next/previous, but what you're displaying isn't really "true" results pagination. I would therefore be cautious about changing it if the site is ranking well.
Canonicals would only be required if you were showing the same content on different URLs. A quick "site:" search like the below only returns one result, so either Google isn't showing the duplicate URLs (very likely given your question) or it isn't a problem for you:
site:www.formationsdirect.com inurl:companysearchlist.aspx?name=AMNA+CONSTRUCTION+LTD
You can look in webmaster tools to see which query string parameters it is picking up and configure the behaviour you want GoogleBot to take. You can also get some sense of the duplication if it is an issue.
Regarding the company page URL you gave, anything after the # in the URL won't get crawled so you don't need to worry about canonicalising those.
Again, if it's ranking well, be very careful about trying to solve a problem that doesn't exist. If you can find duplicate content then definitely redirect or canonicalise it and see what kind of impact it has. I would do this before taking on anything more significant like the website information architecture and navigation.
George
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Page Content - default.html
I am showing a duplicate content error in moz. I have site.com and site.com/default.html How can I fix that? Should I use a canonical tag? If so, how would i do that?
On-Page Optimization | | bhsiao0 -
Locating Duplicate Pages
Hi, Our website consists of approximately 15,000 pages however according to our Google Webmaster Tools account Google has around 26,000 pages for us in their index. I have run through half a dozen sitemap generators and they all only discover the 15,000 pages that we know about. I have also thoroughly gone through the site to attempt to find any sections where we might be inadvertently generating duplicate pages without success. It has been over six months since we did any structural changes (at which point we did 301's to the new locations) and so I'd like to think that the majority of these old pages have been removed from the Google Index. Additionally, the number of pages in the index doesn't appear to be going down by any discernable factor week on week. I'm certain it's nothing to worry about however for my own peace of mind I'd like to just confirm that the additional 11,000 pages are just old results that will eventually disappear from the index and that we're not generating any duplicate content. Unfortunately there doesn't appear to be a way to download a list of the 26,000 pages that Google has indexed so that I can compare it against our sitemap. Obviously I know about site:domain.com however this only returned the first 1,000 results which all checkout fine. I was wondering if anybody knew of any methods or tools that we could use to attempt to identify these 11,000 extra pages in the Google index so we can confirm that they're just old pages which haven’t fallen out of the index yet and that they’re not going to be causing us a problem? Thanks guys!
On-Page Optimization | | ChrisHolgate0 -
High Volume Duplicate Title and Content Errors: Scale of 1-10 How bad is this?
I have 15k pages and 1.5K have duplicate title and content errors. the reason is I have a ring parent page with child pages for each of the size variations. On a scale of 1-10 how big an issue is this and does it need fixing?
On-Page Optimization | | Tippman0 -
Is reported duplication on the pages or their canonical pages?
There are several sections getting flagged for duplication on one of our sites: http://mysite.com/section-1/?something=X&confirmed=true
On-Page Optimization | | Safelincs
http://mysite.com/section-2/?something=X&confirmed=true
http://mysite.com/section-3/?something=X&confirmed=true Each of the above are showing as having duplicates of the other sections. Indeed, these pages are exactly the same (it's just an SMS confirmation page you enter your code in), however, they all have canonical links back to the section (without the query string), i.e. section-1, section-2 and section-3 respectively. These three sections have unique content and aren't flagged up for duplications themselves, so my questions are: Are the pages with the query strings the duplicates, and if so why are the canonical links being ignored? or Are the canonical pages without the query strings the duplicates, and if so why don't they appear as URLs in their own right in the duplicate content report? I am guessing it's the former, but I can't figure out why it would ignore the canonical links. Any ideas? Thanks0 -
Duplicate content
Hello, I have two pages showing dulicate content. They are: http://www.cedaradirondackchairs.net/ http://www.cedaradirondackchairs.net/index Not sure how to resolve this issue. Any help would be greatly appreciated! Thanks.
On-Page Optimization | | Ronb10230 -
Duplicate Title & Content in WordPress
I'm getting a lot of Crawl Errors due to duplicate content and duplicate title because of category and tag posts in WordPress. I rebuilt the sitemap and said to exclude category and tags, should that clear up the issue? I've also went through and did NO INDEX and NO FOLLOW for all categories and posts. Any thoughts on this issue?
On-Page Optimization | | seantgreen0 -
If a site has https versions of every page, will the search engines view them as duplicate pages?
A client's site has HTTPS versions of every page for their site and it is possible to view both http and https versions of the page. Do the search engines view this as duplicate content?
On-Page Optimization | | harryholmes0070 -
Duplicate content Issue
I'm getting a report of duplicate title and content on: http://www.website.com/ http://www.website.com/index.php Of course, they're the same pages but does this need to be corrected somehow. Thanks!
On-Page Optimization | | dbaxa-2613380