How to remove hundreds of duplicate pages
-
Hi - while i was checking duplicate links, am finding hundreds of duplicates pages :-
-
having undefined after domain name and before sub page url
-
having /%5C%22/ after domain name and before the sub page url
-
Due to Pagination limits
Its a joomla site - http://www.mycarhelpline.com
Any suggestions - shall we use:-
-
301 redirect
-
leave these as standdstill
-
and what to do of pagination pages (shall we create a separate title tag n meta description of every pagination page as unique one)
thanks
-
-
Okay, I took a look at the plugin Ben recommended, and took another look at the SH404SEF one. The free one Ben recommended (http://extensions.joomla.org/extensions/site-management/sef/1063) looks like it can help out with some duplicate content - but what I recommend doing is getting the SH404SEF here http://anything-digital.com/sh404sef/features.html because it allows for setting up canonical tags and also gives you the option to add the rel=next feature to your paginated pages, which is one of your problem areas.
One thing I noticed though is that it specifically states it "automatically adds canonical tags to non-html pages" - so that means it will apply it automatically to Joomla's defaul pdf view, etc. While this is helpful, it may not solve the full issue of your duplicate pages with the undefined and "/%5C%22/" issue.
It does however state that it "removes duplicate URLs" - how it identifies and removes these, I am not sure. You may want to try it out because it is useful for other optimization tasks - or contact the owner for more information.
If the tool doesn't recognize and remove the duplicate pages caused by /undefined/ and "/%5C%22/" then you should disallow crawling of these in your robots.txt file. While you are in your robots.txt file you should remove the /images/ because you want those to be crawled - Joomla adds that in by default.
Because a lot of these pages have already been crawled, you should do a 301 on the duplicate pages to their matching page. This sounds like it will be a long process - this may be aided by the sh404sef plugin - not sure.
I just want to also add that I am in no way affiliated with any of these plugins.
-
The only way to solve the duplication error you are getting is to make the URL's distinct. Googlebot comes to your site and looks at the URL's and if they are not distinct it may not index them very well. I understand your site is showing up fine in the SERP's so this may be one of those items you place on a lower priority until later.
I think R.May knows Joomla so I'll refer to him on how to accomplish this but it may be worth it to make the adjustment. You may find the end result of making your page URL more distinct will actually increase your current SERPs. Just a thought.
Other than that. If your site isn't hurting and the only thing you are concerned about is the report in SEOmoz then I would move on and just make a mental note of it for later.
-
Hi ben - changing url is not well required as the site is getting good serp, however - the duplicacy issue to saveguard us from any future issue - is what we seek for
-
Hi - thanks for replying
-
For the dynamic url - Yes - at the initial start - it was missed and as on its not reqd somehow - as the pages are getting indexed well and good in SERP
-
For pagination - Where we needs this is like in our used car section, discount section& news section where multiple pages are created. shall we create separate title & meta description for every pagination page. is it ideally reqd ?
http://www.mycarhelpline.com/index.php?option=com_usedcar&view=category&Itemid=3
- 'undefined' & /%5C%22/ is coming as per report of SEOmoz and is almost on every page of site (except of home page) with the dynamic url after domain name are preceded with these 2 strings as per moz report
how to get this corrected - want to be preventive from this duplicacy n avoid getting a hit in future even if its going well now -
-
-
I'm not a Joomla expert but to make your URL's search engine friendly you are going to need to add an extension like this. That will allow you to make more distinct URLs that will not be considered "duplicate" anymore.
-
Joomla has soo many dup content issues, you have to know Joomla really well to avoid most of them. The biggest issue is you didn't enable the SEF URLs from the start and left the default index.php?option=com on most of them, which stuffs your URLs full of ugly parameters.
You can still enable this in your global options and with a quick edit to htaccess - but it will change all of your current URLs and you will need to 301 all of them, so that isn't a great option unless you are really suffering - and depending on if you are using J 1.6 or under, this is a time consuming nasty process. Also this is unlikely to get rid of any existing duplicate pages - but may make dealing with them and finding them easier.
I don't see the specific examples you posted though, where are you seeing "undefined" and "%5C%22/ " ?
You should implement rel=canonical on the correct version of each page. I recommend SH404SEF which is a Joomla plugin and makes this process easier - but it isn't free. I don't know of a good free plugin that does this, and Joomla's templates make doing this manually difficult.
Looking at it quickly, I also didn't notice any articles that were paginated, but you should try to follow the rel="next" and rel="prev" for paginated pages. This is likely something you will have to edit your Joomla core files to do.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How I improve my ON-PAGE?
Hi, My Tech related site zophra is not rank in google properly and traffic is not increasing. I think my website on-page is not suitable according to google algorithm. Kindly help me if anyone knows about on-page.
Intermediate & Advanced SEO | | igaoevale0 -
Can noindexed pages accrue page authority?
My company's site has a large set of pages (tens of thousands) that have very thin or no content. They typically target a single low-competition keyword (and typically rank very well), but the pages have a very high bounce rate and are definitely hurting our domain's overall rankings via Panda (quality ranking). I'm planning on recommending we noindexed these pages temporarily, and reindex each page as resources are able to fill in content. My question is whether an individual page will be able to accrue any page authority for that target term while noindexed. We DO want to rank for all those terms, just not until we have the content to back it up. However, we're in a pretty competitive space up against domains that have been around a lot longer and have higher domain authorities. Like I said, these pages rank well right now, even with thin content. The worry is if we noindex them while we slowly build out content, will our competitors get the edge on those terms (with their subpar but continually available content)? Do you think Google will give us any credit for having had the page all along, just not always indexed?
Intermediate & Advanced SEO | | THandorf0 -
301 redirect for page 2, page 3 etc of an article or feed
Hey guys, We're looking to move a blog feed we have to a new static URL page. We are using 301 redirects but I'm unsure of what to regarding page 2, page 3 etc. of the feed. How do I make sure those urls are being redirected as well? For example: Moving FloridaDentist.com/blog/dental-tips/ to a new page url FloridaDentist.com/dental-tips. So, we are using a 301 on that old url to the new one. My questions is what to do with the other pages like FloridaDentist.com/blog/dental-tips/page/3. How do we make sure that page is also 301'd to the new main url?
Intermediate & Advanced SEO | | RickyShockley0 -
Best way to remove low quality paginated search pages
I have a website that has around 90k pages indexed, but after doing the math I realized that I only have around 20-30k pages that are actually high quality, the rest are paginated pages from search results within my website. Every time someone searches a term on my site, that term would get its own page, which would include all of the relevant posts that are associated with that search term/tag. My site had around 20k different search terms, all being indexed. I have paused new search terms from being indexed, but what I want to know is if the best route would be to 404 all of the useless paginated pages from the search term pages. And if so, how many should I remove at one time? There must be 40-50k paginated pages and I am curious to know what would be the best bet from an SEO standpoint. All feedback is greatly appreciated. Thanks.
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
How much (%) of the content of a page is considered too much duplication?
Google is not fond of duplication, I have been very kindly told. So how much would you suggest is too much?
Intermediate & Advanced SEO | | simonberenyi0 -
SEOMOZ duplicate page result: True or false?
SEOMOZ say's: I have six (6) duplicate pages. Duplicate content tool checker say's (0) On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory. To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Pneumatic-grippers.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name. What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing. So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page." ****Page Authority Linking Root Domains http://www.agi-automation.com/ 43 82 http://www.agi-automation.com/index.html 25 2 http://www.agi-automation.com/Linear-escapements.htm 21 1 www.agi-automation.com/linear-escapements.htm 16 1 http://www.agi-automation.com/Pneumatic-grippers.htm 30 3 http://www.agi-automation.com/pneumatic-grippers.htm 16 1**** Duplicate content tool estimates the following: www and non-www header response; Google cache check; Similarity check; Default page check; 404 header response; PageRank dispersion check (i.e. if www and non-www versions have different PR).
Intermediate & Advanced SEO | | AGIAutomation0 -
301 a page and then remove the 301
I have a real estate website that has a city hub page. All the homes for sale within a city are linked to from this hub page. Certain small cities may have one home on the market for a month and then not have any homes on the market for months or years. I call them "Ghost Cities". This problem happens across many cities at any point in time. The resulting city hub pages are left with little to no content. We are throwing around the idea of 301 redirecting these "Ghost City" pages to a page higher up in the hierarchy (Think state or county) until we get new homes for sale in the city. At that point we would remove the 301. Any thoughts on this strategy? Is it bad to turn 301s on and off like that? Thanks!
Intermediate & Advanced SEO | | ChrisKolmar0 -
Removing Duplicate Content Issues in an Ecommerce Store
Hi All OK i have an ecommerce store and there is a load of duplicate content which is pretty much the norm with ecommerce store setups e.g. this is my problem http://www.mystoreexample.com/product1.html
Intermediate & Advanced SEO | | ChriSEOcouk
http://www.mystoreexample.com/brandname/product1.html
http://www.mystoreexample.com/appliancetype/product1.html
http://www.mystoreexample.com/brandname/appliancetype/product1.html
http://www.mystoreexample.com/appliancetype/brandname/product1.html so all the above lead to the same product
I also want to keep the breadcrumb path to the product Here's my plan Add a canonical URL to the product page
e.g. http://www.mystoreexample.com/product1.html
This way i have a short product URL Noindex all duplicate pages but do follow the internal links so the pages are spidered What are the other options available and recommended? Does that make sense?
Is this what most people are doing to remove duplicate content pages? thanks 🙂0