More Indexed Pages than URLs on site.
-
According to webmaster tools, the number of pages indexed by Google on my site doubled yesterday (gone from 150K to 450K). Usually I would be jumping for joy but now I have more indexed pages than actual pages on my site.
I have checked for duplicate URLs pointing to the same product page but can't see any, pagination in category pages doesn't seem to be indexed nor does parameterisation in URLs from advanced filtration.
Using the site: operator we get a different result on google.com (450K) to google.co.uk (150K).
Anyone got any ideas?
-
Hi David,
Its tough to say without some more digging and information, it certainly looks like you have most of the common problem areas covered from what I can see. I will throw out an idea: I see you have a few 301 redirects in place switching from .html to non html versions. If this was done on a massive scale then possibly you have a google index with both versions of the pages in the index? If so it might not really be a big issue and over the next weeks/months the old .html versions will fall out of the index and your numbers will begin to look more normal again, Just a thought.
-
Thanks Lynn. The 31,000 was a bit of a legacy of issue and something we have solved. The robots file was changed a couple of weeks ago. So fingers crossed Google will deindex them soon. We get the same result when using inurl: where.
Any idea where the rest have come from?
-
Hi Irving
We checked everything obvious and cannot explain what is going on. I cannot see any major duplicate content issues and we do not have any subdomains active. The Moz crawler also doesn't highlight any major duplicate content issues.
-
Hi David,
Not sure why they started showing up now (some recent changes to the site?) but I suspect your problem is indexed urls that you are trying to block with robots.txt but are finding their way into the index somehow.
If you do a search for: site:nicontrols.com inurl:/manufacturer/ and then click on the show omitted results you will see a whole bunch (31000!) of 'content blocked by robots.txt' notices but the urls are still in the index. If you do a couple more similar searches looking for other likely url paths you will likely find some more.
If you can get a no-index meta tag into these pages I think it will be more effective in keeping them out of the index. If you have in mind some recent changes you have done to the site that might have introduced internal links to these pages then it would be worth looking to see if you can get the links removed or replaced with the 'proper' link format.
Hope that helps!
-
Can you see in the search the pages which are indexed and look for duplicates or technical issues causing improper indexing? Do you have other sites like subdomains Google might be counting as pages.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirecting an Entire Site to a Page on Another Site?
So I have a site that I want to shut down http://vowrenewalsmaui.com and redirect to a dedicated Vow Renewals page I am making on this site here: https://simplemauiwedding.net. My main question is: I don't want to lose all the authority of the pages and if I just redirect the site using my domain registrar's 301 redirect it will only redirect the main URL not all of the supporting pages, to my knowledge. How do I not lose all the authority of the supporting pages and still shut down the site and close down my site builder? I know if I leave the site up I can redirect all of the individual pages to corresponding pages on the other site, but I want to be done with it. Just trying to figure out if there is a better way than I know of. The domain is hosted through GoDaddy.
Intermediate & Advanced SEO | | photoseo10 -
Need help in de-indexing URL parameters in my website.
Hi, Need some help.
Intermediate & Advanced SEO | | ImranZafar
So this is my website _https://www.memeraki.com/ _
If you hover over any of the products, there's a quick view option..that opens up a popup window of that product
That popup is triggered by this URL. _https://www.memeraki.com/products/never-alone?view=quick _
In the URL you can see the parameters "view=quick" which is infact responsible for the pop-up. The problem is that the google and even your Moz crawler is picking up this URL as a separate webpage, hence, resulting in crawl issues, like missing tags.
I've already used the webmaster tools to block the "view" parameter URLs in my website from indexing but it's not fixing the issue
Can someone please provide some insights as to how I can fix this?0 -
Robots.txt Disallowed Pages and Still Indexed
Alright, I am pretty sure I know the answer is "Nothing more I can do here." but I just wanted to double check. It relates to the robots.txt file and that pesky "A description for this result is not available because of this site's robots.txt". Typically people want the URL indexed and the normal Meta Description to be displayed but I don't want the link there at all. I purposefully am trying to robots that stuff outta there.
Intermediate & Advanced SEO | | DRSearchEngOpt
My question is, has anybody tried to get a page taken out of the Index and had this happen; URL still there but pesky robots.txt message for meta description? Were you able to get the URL to no longer show up or did you just live with this? Thanks folks, you are always great!0 -
One site two languages - what to do with urls?
Hi, We are working with a client who has a Spanish site which is in English and Spanish, what is the best url structure to go for? www.domain.es and en.domain.es or www.domain.es and www.domain.es/en or none of the above?
Intermediate & Advanced SEO | | J_Sinclair0 -
Why is this site not indexed by Google?
Hi all and thanks for your help in advance. I've been asked to take a look at a site, http://www.yourdairygold.ie as it currently does not appear for its brand name, Your Dairygold on Google Ireland even though it's been live for a few months now. I've checked all the usual issues such as robots.txt (doesn't have one) and the robots meta tag (doesn't have them). The even stranger thing is that the site does rank on Yahoo! and Bing. Google Webmaster Tools shows that Googlebot is crawling around 150 pages a day but the total number of pages indexed is zero. It does appear if you carry out a site: search on Google however. The site is very poorly optimised in terms of title tags, unnecessary redirects etc which I'm working on now but I wondered if you guys had any further insights. Thanks again for your help.
Intermediate & Advanced SEO | | iProspect-Ireland0 -
Indexing a several millions pages new website
Hello everyone, I am currently working for a huge classified website who will be released in France in September 2013. The website will have up to 10 millions pages. I know the indexing of a website of such size should be done step by step and not in only one time to avoid a long sandbox risk and to have more control about it. Do you guys have any recommandations or good practices for such a task ? Maybe some personal experience you might have had ? The website will cover about 300 jobs : In all region (= 300 * 22 pages) In all departments (= 300 * 101 pages) In all cities (= 300 * 37 000 pages) Do you think it would be wiser to index couple of jobs by couple of jobs (for instance 10 jobs every week) or to index with levels of pages (for exemple, 1st step with jobs in region, 2nd step with jobs in departements, etc.) ? More generally speaking, how would you do in order to avoid penalties from Google and to index the whole site as fast as possible ? One more specification : we'll rely on a (big ?) press followup and on a linking job that still has to be determined yet. Thanks for your help ! Best Regards, Raphael
Intermediate & Advanced SEO | | Pureshore0 -
De Index Section of Page?
Hey all! We're having a couple of issues with a certain section of our page that we don't want to index. Basically, our cross sells change really quickly, and big G is ranking them and linking to them even when they've long gone. Is it possible to put some kind of no index tag for a specific section of the page? See below 🙂 http://www.freestylextreme.com/uk/Home/Brands/DC-Shoe-Co-/Mens-DC-Shoe-Co-Hoodies-and-Sweaters/DC-Black-Rob-Dyrdek-Official-Sweater.aspx Thanks!
Intermediate & Advanced SEO | | elbeno0 -
What is the best tool to crawl a site with millions of pages?
I want to crawl a site that has so many pages that Xenu and Screaming Frog keep crashing at some point after 200,000 pages. What tools will allow me to crawl a site with millions of pages without crashing?
Intermediate & Advanced SEO | | iCrossing_UK0