Locating Duplicate Pages
-
Hi,
Our website consists of approximately 15,000 pages however according to our Google Webmaster Tools account Google has around 26,000 pages for us in their index.
I have run through half a dozen sitemap generators and they all only discover the 15,000 pages that we know about. I have also thoroughly gone through the site to attempt to find any sections where we might be inadvertently generating duplicate pages without success.
It has been over six months since we did any structural changes (at which point we did 301's to the new locations) and so I'd like to think that the majority of these old pages have been removed from the Google Index. Additionally, the number of pages in the index doesn't appear to be going down by any discernable factor week on week.
I'm certain it's nothing to worry about however for my own peace of mind I'd like to just confirm that the additional 11,000 pages are just old results that will eventually disappear from the index and that we're not generating any duplicate content.
Unfortunately there doesn't appear to be a way to download a list of the 26,000 pages that Google has indexed so that I can compare it against our sitemap. Obviously I know about site:domain.com however this only returned the first 1,000 results which all checkout fine.
I was wondering if anybody knew of any methods or tools that we could use to attempt to identify these 11,000 extra pages in the Google index so we can confirm that they're just old pages which haven’t fallen out of the index yet and that they’re not going to be causing us a problem?
Thanks guys!
-
It's cool. Sorry, the point I was making is that irrespective of what you search for the page that is returned is http://www.refreshcartridges.co.uk/advanced_search_result.php (with nothing after the .php) and as such the search results page couldn't spurn multiple pages which could be indexed by Google.
-
Hmm, I'm not too knowledgeable about php pages. Sorry!
-
Sorry, I'm not sure what happened to that bit.ly address - The actual address of the website is www.refreshcartridges.co.uk.
Ah, I see what you mean about the search results now however this hopefully shouldn't be an issue as for security (our web guy said something about injections) the URL that is returned irrespective of what is searched for is http://www.refreshcartridges.co.uk/advanced_search_result.php
Thanks again!
-
I can't get that link to work.
What I said before still applies with physical input (this is what I assumed when I said it).
For example, user inputs the words "snakes and dogs" and clicks search. The new URL is "www.yoursite.com/search?q=snakes and dogs" All these weird URL pages need noindex meta tags or Google will flag them as duplicate content because, for example, this page and the result for "dogs and snakes" generate almost the same page.
Does that make sense?
It is in Google's Webmaster Guidelines that you should noindex these pages. -
Many thanks for your input on this. I have actually looked at this through the HTML improvements section of GWMT however I am showing only a few dozen duplicated titles / descriptions and this is simply due to the product categories being almost identical (for example HP Deskjet 500 and HP Deskjet 500+)
-
Many thanks for your response. Our site is an eCommerce site that doesn't employ tags as such and our categories are all accounted for in the 15,000 page figure.
-
We did have this at the beginning of the year when we used a ?dispmode=grid and ?dispmode=list to change the way our results were displayed. This has been rectified however by us completely removing the option and any instances of dispmode present in the URL force a 301 to the correct master page. There are still a few hundred instances of this dispmode being present in the Google index but 99% of them have fallen out now.
I have checked and double checked and we don't seem to have any issues like this at present.
-
I'm not certain if this is the case as our search engine requires physical input in order to yield a result. I don't know if it helps but the URL is http://bit.ly/4Cogchww if you fancy taking a look
-
Thanks for your reply. Indeed our website does force www. if someone were to attempt to navigate to us without prefixing www.
-
Hi Chris,
Google Webmaster has a tool that helps identify duplicate HTMLs and maybe you can use that to see if the 11,000 pages are duplicate. IF they are, I am assuming they should have the duplicate Title Tag and etc. which the tool may discover.
-
Have you checked for instances where a page parameter is being seen as another version of the same page? One of the sites I work for had an issue a few months back where every instance of a product page was being flagged as duplicate content because of an oversight. We had one of our coders write a clause into the page where every time a page loaded with a parameter such as ?color=72 it would canonicalize it to the page minus the parameter. This decreased our duplicate content warnings quickly and effectively.
-
it could be that your tags and categories are considered individual pages and therefore creating their own permalink: ex: http:www.example.com/keyword, and http://www.example.com/tag/keyword and http://www.example.com/category/keyword. Another way would be to check the sitemaps you have in webmaster tools and compare those to each other. Just a suggestion.
-
Does your website force 'www.'?
Both yourdomain.com and www.yourdomain.com are separate sites and can have different pages spidered.
-
Be sure to try different combinations of 'site:www.domain.com' and 'site:domain.com'. They will all yield different results.
Sounds to me like you probably have an internal search engine that is generating search results pages based off the search term, and each different results page is a piece of duplicate content.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Where to position a new page?
Hi there 🙂
On-Page Optimization | | Enrico_Cassinelli
Our website is about a particular region in Italy, the Langhe area, famous for food and wine (barolo and barbaresco are produced here). We need to rollout a few new pages about cellar/winery tours: one main page with the list of tours, and the various subpages for each tour. We already have a page about travel, and a page about wine (with a sub-page about wineries). The URLs looks like:
langhe.net/travel/
langhe.net/wine/wineries/
(Note: i'm translating from italian here) Now, I'm wondering where is better to position the new pages:
langhe.net/travel/winery-tours/name-of-tour/ or
langhe.net/wine/wineries/tours/name-of-tour/ From an SEO perspective (within my limited experience) the first option has a shorter URL, but the second feels more "natural" to me. What do you think? Thanks 🙂
Best0 -
Does having landing page text beneath the products at the base of the page hinder SEO?
I have a site that is capable of hosting the landing page description either above the products under the H1 or below them at the bottom of the page before the footer. I have always chosen to keep the text "above the fold" as presumably this would be crawled sooner in relation to the rest of the page content than had it been at the bottom. However, this means that I can only really write just a few sentences for each landing page - otherwise the products would shift further down the page - and I don't think this is good from a UX POV. Question: If I move the bulk of my landing page descriptions to the text snippet located underneath the products, could this negatively affect my SEO? Text at the bottom of the page is obviously not significant for users, so is there a chance this could be seen as spam?
On-Page Optimization | | Silkstream0 -
How should i optimize this page
Hi, i am having major problems in optimizing this page as it is a magazine site. On normal sites i have no problem in optimizing the page to get the correct keywords to come up in the search engines but since the upgrade and also because it is a magazine site, i am having problems on how i should do this. my site is www.in2town.co.uk and i am trying to optimize the page for the following keywords lifestyle magazine online magazine lifestyle news Life and Style articles healthy lifestyle i am trying to make sure that google knows what the magazine is about, as i know have dropped down the rankings since the upgrade and for lifestyle magazine we were number one in google for such a long time but now we are on page 9 and this is our home page. we are seeing sites that have hardly any content ranking above us for this keyword i have a small intro which i have just put in the past few days at the top and we have a welcome in the middle which is here. Welcome to In2town Lifestyle Magazine Our Lifestyle Magazine is a fresh, innovative and vibrant online magazine offering you the best in health,fitness and life & style features, as well as modern lifestyle, beauty, fashion, personal finance and entertainment. Over the years In2town Lifestyle Magazine has established a reputation for quality articles and informed lifestyle and health features thanks to our experienced team of editorial professionals. By reading our online Lifestyle Magazine, you will be able to enjoy the interesting mix of entertainment features, health and lifestyle news as well as finding out what is happening in the celebrity world. We are always happy to hear from our readers, if you have lifestyle news or a story that you feel our readers would be interested in then please do contact us. xxxxxxxx but i would like to get rid of that section as i am going to put the latest articles there. any advice on how to sort this mess out would be great
On-Page Optimization | | ClaireH-1848860 -
Does the title tag on the home page affect sub-pages?
Hello. I am thinking of changing our home page title tag to include our two most valuable keywords from two of our sub-pages. Would this help the rankings of those two sub-pages? Thank you!
On-Page Optimization | | nyc-seo0 -
"City page" links in footer of home page: Spammy?
Is listing a bunch of links to city pages in the footer of a home page considered "spammy" to Google? (ie- Chicago Alarms, Illinois Alarms, Naperville Alarms, etc.) What are the negative affects this might have on ranking, if any?
On-Page Optimization | | MChi0 -
Removing OLD pages
Dear all, I was removing tons of old pages from my directory (about 400 pages), I was setingup a 404 custom page, all is fine, so when I go to an existing page I get a 404 and redirected to my 404 page. The problem is Google Webmaster tools list all these pages as 404, and never clean my list (1 year til now), so I assume something is wrong. Question what is the best way or natural to remove old pages from one directory? Note: previously I tryed add on these pages the NOINDEX/NOFOLLOW meta tag and I got from google Soft-404. Thank you
On-Page Optimization | | SharewarePros0 -
Which redirect to use when redirecting to https page from http page
I have one form under https which is redirected from the regular http page. this site was not made by me and I am trying to understand if the way it was redirected using 302 redirect is a problem Thanks
On-Page Optimization | | ciznerguy0 -
Duplicate content - what to do?
Hi, We have a whole lot of articles on our site. In total 5232 actually. The web crawler tells me that in the articles we have a lot of duplicate content. Which is sort of nonsense, since each article is unique. Ah, some might have some common paragraphs because they are recurring news about a weekly competition. But, an example: http://www.betxpert.com/artikler/bookmakere/brandvarme-ailton-snupper-topscorerprisen AND http://www.betxpert.com/artikler/bookmakere/opdaterede-odds-pa-sportschef-situationen-pa-vestegnen These are "duplicate content", however the two article texts are not the same. The menu, and the widgets are all the same, but highly relevant to the article. So what should I do? How can i rid myself of these errors? -Rasmus
On-Page Optimization | | rasmusbang0