Locating Duplicate Pages
-
Hi,
Our website consists of approximately 15,000 pages however according to our Google Webmaster Tools account Google has around 26,000 pages for us in their index.
I have run through half a dozen sitemap generators and they all only discover the 15,000 pages that we know about. I have also thoroughly gone through the site to attempt to find any sections where we might be inadvertently generating duplicate pages without success.
It has been over six months since we did any structural changes (at which point we did 301's to the new locations) and so I'd like to think that the majority of these old pages have been removed from the Google Index. Additionally, the number of pages in the index doesn't appear to be going down by any discernable factor week on week.
I'm certain it's nothing to worry about however for my own peace of mind I'd like to just confirm that the additional 11,000 pages are just old results that will eventually disappear from the index and that we're not generating any duplicate content.
Unfortunately there doesn't appear to be a way to download a list of the 26,000 pages that Google has indexed so that I can compare it against our sitemap. Obviously I know about site:domain.com however this only returned the first 1,000 results which all checkout fine.
I was wondering if anybody knew of any methods or tools that we could use to attempt to identify these 11,000 extra pages in the Google index so we can confirm that they're just old pages which haven’t fallen out of the index yet and that they’re not going to be causing us a problem?
Thanks guys!
-
It's cool. Sorry, the point I was making is that irrespective of what you search for the page that is returned is http://www.refreshcartridges.co.uk/advanced_search_result.php (with nothing after the .php) and as such the search results page couldn't spurn multiple pages which could be indexed by Google.
-
Hmm, I'm not too knowledgeable about php pages. Sorry!
-
Sorry, I'm not sure what happened to that bit.ly address - The actual address of the website is www.refreshcartridges.co.uk.
Ah, I see what you mean about the search results now however this hopefully shouldn't be an issue as for security (our web guy said something about injections) the URL that is returned irrespective of what is searched for is http://www.refreshcartridges.co.uk/advanced_search_result.php
Thanks again!
-
I can't get that link to work.
What I said before still applies with physical input (this is what I assumed when I said it).
For example, user inputs the words "snakes and dogs" and clicks search. The new URL is "www.yoursite.com/search?q=snakes and dogs" All these weird URL pages need noindex meta tags or Google will flag them as duplicate content because, for example, this page and the result for "dogs and snakes" generate almost the same page.
Does that make sense?
It is in Google's Webmaster Guidelines that you should noindex these pages. -
Many thanks for your input on this. I have actually looked at this through the HTML improvements section of GWMT however I am showing only a few dozen duplicated titles / descriptions and this is simply due to the product categories being almost identical (for example HP Deskjet 500 and HP Deskjet 500+)
-
Many thanks for your response. Our site is an eCommerce site that doesn't employ tags as such and our categories are all accounted for in the 15,000 page figure.
-
We did have this at the beginning of the year when we used a ?dispmode=grid and ?dispmode=list to change the way our results were displayed. This has been rectified however by us completely removing the option and any instances of dispmode present in the URL force a 301 to the correct master page. There are still a few hundred instances of this dispmode being present in the Google index but 99% of them have fallen out now.
I have checked and double checked and we don't seem to have any issues like this at present.
-
I'm not certain if this is the case as our search engine requires physical input in order to yield a result. I don't know if it helps but the URL is http://bit.ly/4Cogchww if you fancy taking a look
-
Thanks for your reply. Indeed our website does force www. if someone were to attempt to navigate to us without prefixing www.
-
Hi Chris,
Google Webmaster has a tool that helps identify duplicate HTMLs and maybe you can use that to see if the 11,000 pages are duplicate. IF they are, I am assuming they should have the duplicate Title Tag and etc. which the tool may discover.
-
Have you checked for instances where a page parameter is being seen as another version of the same page? One of the sites I work for had an issue a few months back where every instance of a product page was being flagged as duplicate content because of an oversight. We had one of our coders write a clause into the page where every time a page loaded with a parameter such as ?color=72 it would canonicalize it to the page minus the parameter. This decreased our duplicate content warnings quickly and effectively.
-
it could be that your tags and categories are considered individual pages and therefore creating their own permalink: ex: http:www.example.com/keyword, and http://www.example.com/tag/keyword and http://www.example.com/category/keyword. Another way would be to check the sitemaps you have in webmaster tools and compare those to each other. Just a suggestion.
-
Does your website force 'www.'?
Both yourdomain.com and www.yourdomain.com are separate sites and can have different pages spidered.
-
Be sure to try different combinations of 'site:www.domain.com' and 'site:domain.com'. They will all yield different results.
Sounds to me like you probably have an internal search engine that is generating search results pages based off the search term, and each different results page is a piece of duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
On Page Optimization Report
Does this tool also guard against an instance of over-optimization or keyword-spamming?
On-Page Optimization | | webfeatus0 -
Duplicate Page Content on Empty Manufacturer Pages
I work for an internet retailer that specializes in pet supplies and medications. I was going through the Crawl Diagnostics for our website, and I saw in the Duplicate Page Content section that some of our manufacturer pages were getting flagged. The way our site is set up is that when products are discontinued we mark them as discontinued and use 301 redirects to redirect their URLs to other relevant products, brands, or our homepage. We do the same thing with brand and manufacturer pages if all of their products are discontinued. 90% of the time, this is a manual process. However, the other 10% of the time certain products come and go automatically as part of our inventory system with one of our fulfillment partners. This can sometimes create empty manufacturer pages. I can't redirect these empty pages because there's a chance that products will be brought back in stock and the page will be populated again. What can we do so that these pages won't get marked as duplicates while they're empty? Write unique short descriptions about the companies? Would the placement of these short descriptions matter--top of the page under the category name vs bottom of the page underneath where the products would go? The links in the left sidebar, top, and in the footer our part of our site architecture, so those are always going to be the same. To contrast, here's what a manufacturer page with products looks like: Thanks! http://www.vetdepot.com/littermaid-manufacturer.html
On-Page Optimization | | ElDude0 -
How should I rephrase these pages to avoid Phrase duplication within Title Tags
How should I rephrase these pages to avoid Phrase duplication within Title Tags Duplicate Page Title Page1-http://organicfruitbasketsflorist.com/index-2.html Page2- http://organicfruitbasketsflorist.com/Fruit_Baskets_Organic_Fruit_Baskets_New_York_NY.html Page3- http://organicfruitbasketsflorist.com/Fruit_Baskets_Edible_Fruit_Baskets_New_York_NY.html Page4organicfruitbasketsflorist.com/Fruit_Baskets_Business_Fruit_Baskets_New_York_NY.html Page5-http://organicfruitbasketsflorist.com/Fruit_Baskets_Fresh_Flowers_Delivered_New_York_NY.html Page6-http://organicfruitbasketsflorist.com/Coupons.htmlAmi
On-Page Optimization | | amydiamond0 -
Summarize your question.Images being seen as duplicate content/pages
My images suddenly are appearing in my crawl reports as duplicate content, without meta tags, this happened over night and cant figure out why.
On-Page Optimization | | RBYoung0 -
Break-up content into individual pages or keep on one page
I am working on a dental website. Under menu item "services" lists everything he does like.. Athletic Sports Guards
On-Page Optimization | | Czubmeister
An athletic sports guard is a resilient plastic appliance that is worn to protect the teeth and gum tissues by absorbing the forces generated by traumatic blows during sports or other activities. Digital X-Rays We use state of the art digital x-rays and digital cameras to help with an accurate diagnosis of any concerns. Digital Imaging On initial visits, and recall visits, we take a series of digital photographs to aid us in diagnosis as well as to give you a close-up view of your mouth and any oral conditions. Smile Makeovers
We offer a number of different options including bleaching, bonding, porcelain veeners, and in some cases, implants and/or orthodontic care is utilized in our smile makeover planning. Nitrous oxide for your Comfort Would it be better to break these services up into individual pages? I was thinking I would because then I could add more pictures and expand on the topic and try to get an "A" grade on each page. I'm not sure how I could rank a page if I have 35 services listed on the page. That would be an awfully big H1! Suggestions?0 -
Too many links on a page?
On my blog posts, I have links to all the categories and months, dating back 5-6 years. This make the number of links on each blog page well over 100, which I understand might decrease the value of each page. Is there a problem with having more than 100 links on a page?
On-Page Optimization | | rdreich492 -
Duplicate content on video pages
Hi guys, We have a video section on our site containing about 50 videos, grouped by category/difficulty. On each video page except for the embedded player, a sentence or two describing the video and a list of related video links, there's pretty much nothing else. All of those appear as duplicate content by category. What should we do here? How long a description should be for those pages to appear unique for crawlers? Thanks!
On-Page Optimization | | lgrozeva0 -
Too many on page links
Our home page (and 1400 of our other pages) have well over 100 links, going beyond the recommend amount. Our competitors have less on page links (to other pages on their site) and way more link popularity so we are trying to figure out the best solution for this without hurting our sites conversions and usbaility.
On-Page Optimization | | iAnalyst.com0