Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Investigating a huge spike in indexed pages
-
I've noticed an enormous spike in pages indexed through WMT in the last week. Now I know WMT can be a bit (OK, a lot) off base in its reporting but this was pretty hard to explain. See, we're in the middle of a huge campaign against dupe content and we've put a number of measures in place to fight it. For example:
-
Implemented a strong canonicalization effort
-
NOINDEX'd content we know to be duplicate programatically
-
Are currently fixing true duplicate content issues through rewriting titles, desc etc.
So I was pretty surprised to see the blow-up. Any ideas as to what else might cause such a counter intuitive trend? Has anyone else see Google do something that suddenly gloms onto a bunch of phantom pages?
-
-
I haven't contacted the forum yet but that's my next step.
Pages indexed: 91k
Blocked by robots.txt: 8.4million
I don't even know how you could create 8.4 million indexable pages from our content.
-
Have you contacted the Google Webmaster Help forums? As that seems to be a glitch in Google.
How many pages are scraped by Mozbot? If the amount that mozbot shows is different, then you should either sit and wait until Google removes those indexed pages or create a conversation on the forums so someone at google can give you a hint of what is going on.
-
Any help out there? Since the original question was posted, I've seen some improvement but even with aggressive canonicalization and noindexing, I'm still seeing a boatload of indexed pages. I am still seeing pages indexed that I've asked explicitly to be omitted by robots.txt (/search.aspx and */filter). I'm guessing it's just going to take a while to deindex what's there. Still, 91k pages indexed is quite a lot when you consider we only have about 3-4k pages and some articles.
Is anyone aware of any significant releases by Google?
-
Quite recent. We were actually seeing a nice downward trend in the huge number of pages indexed and then the number tripled. Crazy is an understatement. I would have thought the number of pages would fall given the number of pages that now use canonicals.
-
How long have you waited since you applied all the rules to avoid duplicate content, as if it was just recently, then Google should be "rebuilding" the index of your site and stats may be a little crazy while that is happening.
If it was over 2 month ago and you are seeing the increase now, then I'd suggest you revise the rules you created to see if your own Website isn't creating all those new pages.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Escort directory page indexing issues
Re; escortdirectory-uk.com, escortdirectory-usa.com, escortdirectory-oz.com.au,
Technical SEO | | ZuricoDrexia
Hi, We are an escort directory with 10 years history. We have multiple locations within the following countries, UK, USA, AUS. Although many of our locations (towns and cities) index on page one of Google, just as many do not. Can anyone give us a clue as to why this may be?0 -
Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
A page on our WordPress powered website has had an error message thrown up in GSC to say it is included in the sitemap but set to 'noindex'. The page has also been removed from Google's search results. Page is https://www.onlinemortgageadvisor.co.uk/bad-credit-mortgages/how-to-get-a-mortgage-with-bad-credit/ Looking at the page code, plus using Screaming Frog and Ahrefs crawlers, the page is very clearly still set to 'index'. The SEO plugin we use has not been changed to 'noindex' the page. I have asked for it to be reindexed via GSC but I'm concerned why Google thinks this page was asked to be noindexed. Can anyone help with this one? Has anyone seen this before, been hit with this recently, got any advice...?
Technical SEO | | d.bird0 -
Getting high priority issue for our xxx.com and xxx.com/home as duplicate pages and duplicate page titles can't seem to find anything that needs to be corrected, what might I be missing?
I am getting high priority issue for our xxx.com and xxx.com/home as reporting both duplicate pages and duplicate page titles on crawl results, I can't seem to find anything that needs to be corrected, what am I be missing? Has anyone else had a similar issue, how was it corrected?
Technical SEO | | tgwebmaster0 -
Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?
I am currently working on a small site with approx 50 web pages. In the crawl error section in WMT Google has highlighted over 10,000 page not found errors for pages that have nothing to do with my site. Anyone come across this before?
Technical SEO | | Pete40 -
Is it better to use XXX.com or XXX.com/index.html as canonical page
Is it better to use 301 redirects or canonical page? I suspect canonical is easier. The question is, which is the best canonical page, YYY.com or YYY.com/indexhtml? I assume YYY.com, since there will be many other pages such as YYY.com/info.html, YYY.com/services.html, etc.
Technical SEO | | Nanook10 -
De-indexing millions of pages - would this work?
Hi all, We run an e-commerce site with a catalogue of around 5 million products. Unfortunately, we have let Googlebot crawl and index tens of millions of search URLs, the majority of which are very thin of content or duplicates of other URLs. In short: we are in deep. Our bloated Google-index is hampering our real content to rank; Googlebot does not bother crawling our real content (product pages specifically) and hammers the life out of our servers. Since having Googlebot crawl and de-index tens of millions of old URLs would probably take years (?), my plan is this: 301 redirect all old SERP URLs to a new SERP URL. If new URL should not be indexed, add meta robots noindex tag on new URL. When it is evident that Google has indexed most "high quality" new URLs, robots.txt disallow crawling of old SERP URLs. Then directory style remove all old SERP URLs in GWT URL Removal Tool This would be an example of an old URL:
Technical SEO | | TalkInThePark
www.site.com/cgi-bin/weirdapplicationname.cgi?word=bmw&what=1.2&how=2 This would be an example of a new URL:
www.site.com/search?q=bmw&category=cars&color=blue I have to specific questions: Would Google both de-index the old URL and not index the new URL after 301 redirecting the old URL to the new URL (which is noindexed) as described in point 2 above? What risks are associated with removing tens of millions of URLs directory style in GWT URL Removal Tool? I have done this before but then I removed "only" some useless 50 000 "add to cart"-URLs.Google says themselves that you should not remove duplicate/thin content this way and that using this tool tools this way "may cause problems for your site". And yes, these tens of millions of SERP URLs is a result of a faceted navigation/search function let loose all to long.
And no, we cannot wait for Googlebot to crawl all these millions of URLs in order to discover the 301. By then we would be out of business. Best regards,
TalkInThePark0 -
How to get Google to index another page
Hi, I will try to make my question clear, although it is a bit complex. For my site the most important keyword is "Insurance" or at least the danish variation of this. My problem is that Google are'nt indexing my frontpage on this, but are indexing a subpage - www.mydomain.dk/insurance instead of www.mydomain.dk. My link bulding will be to subpages and to my main domain, but i wont be able to get that many links to www.mydomain.dk/insurance. So im interested in making my frontpage the page that is my main page for the keyword insurance, but without just blowing the traffic im getting from the subpage at the moment. Is there any solutions to do this? Thanks in advance.
Technical SEO | | Petersen110 -
What's the difference between a category page and a content page
Hello, Little confused on this matter. From a website architectural and content stand point, what is the difference between a category page and a content page? So lets say I was going to build a website around tea. My home page would be about tea. My category pages would be: White Tea, Black Tea, Oolong Team and British Tea correct? ( I Would write content for each of these topics on their respective category pages correct?) Then suppose I wrote articles on organic white tea, white tea recipes, how to brew white team etc...( Are these content pages?) Do I think link FROM my category page ( White Tea) to my ( Content pages ie; Organic White Tea, white tea receipes etc) or do I link from my content page to my category page? I hope this makes sense. Thanks, Bill
Technical SEO | | wparlaman0