More Indexed Pages than URLs on site.
-
According to webmaster tools, the number of pages indexed by Google on my site doubled yesterday (gone from 150K to 450K). Usually I would be jumping for joy but now I have more indexed pages than actual pages on my site.
I have checked for duplicate URLs pointing to the same product page but can't see any, pagination in category pages doesn't seem to be indexed nor does parameterisation in URLs from advanced filtration.
Using the site: operator we get a different result on google.com (450K) to google.co.uk (150K).
Anyone got any ideas?
-
Hi David,
Its tough to say without some more digging and information, it certainly looks like you have most of the common problem areas covered from what I can see. I will throw out an idea: I see you have a few 301 redirects in place switching from .html to non html versions. If this was done on a massive scale then possibly you have a google index with both versions of the pages in the index? If so it might not really be a big issue and over the next weeks/months the old .html versions will fall out of the index and your numbers will begin to look more normal again, Just a thought.
-
Thanks Lynn. The 31,000 was a bit of a legacy of issue and something we have solved. The robots file was changed a couple of weeks ago. So fingers crossed Google will deindex them soon. We get the same result when using inurl: where.
Any idea where the rest have come from?
-
Hi Irving
We checked everything obvious and cannot explain what is going on. I cannot see any major duplicate content issues and we do not have any subdomains active. The Moz crawler also doesn't highlight any major duplicate content issues.
-
Hi David,
Not sure why they started showing up now (some recent changes to the site?) but I suspect your problem is indexed urls that you are trying to block with robots.txt but are finding their way into the index somehow.
If you do a search for: site:nicontrols.com inurl:/manufacturer/ and then click on the show omitted results you will see a whole bunch (31000!) of 'content blocked by robots.txt' notices but the urls are still in the index. If you do a couple more similar searches looking for other likely url paths you will likely find some more.
If you can get a no-index meta tag into these pages I think it will be more effective in keeping them out of the index. If you have in mind some recent changes you have done to the site that might have introduced internal links to these pages then it would be worth looking to see if you can get the links removed or replaced with the 'proper' link format.
Hope that helps!
-
Can you see in the search the pages which are indexed and look for duplicates or technical issues causing improper indexing? Do you have other sites like subdomains Google might be counting as pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Better to 301 or de-index 403 pages
Google WMT recently found and called out a large number of old unpublished pages as access denied errors. The pages are tagged "noindex, follow." These old pages are in Google's index. At this point, would it better to 301 all these pages or submit an index removal request or what? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Canonical URL on search result pages
Hi there, Our company sells educational videos to Nurses via subscription. I've been looking at their video search results page:
Intermediate & Advanced SEO | | 9868john
http://www.nursesfornurses.com.au/cpd When you click on a category, the URL appears like this:
http://www.nursesfornurses.com.au/cpd?view=category&cat=9&name=Acute+Surgical+Nursing
http://www.nursesfornurses.com.au/cpd?view=category&cat=6&name=Medications Would this be an instance where i'd use the canonical tag to redirect each search results page? Bearing in mind the /cpd page is under /Nursing cpd, and that /Nursing cpd is our best performing page in search engines, would it be better to refer it to the 'Nursing CPD' rather than 'CPD' page? Any advice is very welcome,
Thanks,
John0 -
Webmaster Tools Not Indexing New Pages
Hi there Mozzers, Running into a small issue. After a homepage redesign (from a list of blog posts to a product page), it seems that blog posts are buried on the http://OrangeOctop.us/ site. The latest write-up on "how to beat real madrid in FIFA 15", http://orangeoctop.us/against-real-madrid-fifa-15/ , has yet to be indexed. It would normally take about a day naturally for pages to be indexed or instantly with a manual submission. I have gone into webmaster tools and manually submitted the page for crawls multiple times on multiple devices. Still not showing up in the search results. Can anybody advise?
Intermediate & Advanced SEO | | orangeoctop.us0 -
How can I get a list of every url of a site in Google's index?
I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is. Is there a way to get an excel sheet with every url Google has indexed for a site? Thanks... Mike
Intermediate & Advanced SEO | | 945010 -
Are pages with a canonical tag indexed?
Hello here, here are my questions for you related to the canonical tag: 1. If I put online a new webpage with a canonical tag pointing to a different page, will this new page be indexed by Google and will I be able to find it in the index? 2. If instead I apply the canonical tag to a page already in the index, will this page be removed from the index? Thank you in advance for any insights! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Getting Pages Requiring Login Indexed
Somehow certain newspapers' webpages show up in the index but require login. My client has a whole section of the site that requires a login (registration is free), and we'd love to get that content indexed. The developer offered to remove the login requirement for specific user agents (eg Googlebot, et al.). I am afraid this might get us penalized. Any insight?
Intermediate & Advanced SEO | | TheEspresseo0 -
Page URL keywords
Hello everybody, I've read that it's important to put your keywords at the front of your page title, meta tag etc, but my question is about the page url. Say my target keywords are exotic, soap, natural, and organic. Will placing the keywords further behind the URL address affect the SEO ranking? If that's the case what's the first n number of words Google considers? For example, www.splendidshop.com/gift-set-organic-soap vs www.splendidshop.com/organic-soap-gift-set Will the first be any less effective than the second one simply because the keywords are placed behind?
Intermediate & Advanced SEO | | ReferralCandy0 -
How do I increase rankings when the indexed page is the homepage?
Hi Forum, This is a two-part question. The first is: "what may be the cause of some rank declines?" and the second is "how do I bring them back up when the indexed page is the homepage?" Over the last week I noticed some declines in several of my top keywords, many of which point to the site's homepage. The site itself is an eCommerce site, which had less visits last week than normal (holidays it seems, since the data jibes with key dates). Can a decline in traffic cause ranking declines? Any other ideas of where to look? Secondly, for those keywords that link to the homepage, how do we bring these back up since a homepage can't be optimized for every single keyword? We sell yoga products and can't have a homepage that is optimized for keywords like "yoga mat," "yoga blocks," "yoga pilates clothing," and several others, as these are our category pages' keywords. Any thoughts? Thanks!
Intermediate & Advanced SEO | | pano0