Investigating a huge spike in indexed pages
-
I've noticed an enormous spike in pages indexed through WMT in the last week. Now I know WMT can be a bit (OK, a lot) off base in its reporting but this was pretty hard to explain. See, we're in the middle of a huge campaign against dupe content and we've put a number of measures in place to fight it. For example:
-
Implemented a strong canonicalization effort
-
NOINDEX'd content we know to be duplicate programatically
-
Are currently fixing true duplicate content issues through rewriting titles, desc etc.
So I was pretty surprised to see the blow-up. Any ideas as to what else might cause such a counter intuitive trend? Has anyone else see Google do something that suddenly gloms onto a bunch of phantom pages?
-
-
I haven't contacted the forum yet but that's my next step.
Pages indexed: 91k
Blocked by robots.txt: 8.4million
I don't even know how you could create 8.4 million indexable pages from our content.
-
Have you contacted the Google Webmaster Help forums? As that seems to be a glitch in Google.
How many pages are scraped by Mozbot? If the amount that mozbot shows is different, then you should either sit and wait until Google removes those indexed pages or create a conversation on the forums so someone at google can give you a hint of what is going on.
-
Any help out there? Since the original question was posted, I've seen some improvement but even with aggressive canonicalization and noindexing, I'm still seeing a boatload of indexed pages. I am still seeing pages indexed that I've asked explicitly to be omitted by robots.txt (/search.aspx and */filter). I'm guessing it's just going to take a while to deindex what's there. Still, 91k pages indexed is quite a lot when you consider we only have about 3-4k pages and some articles.
Is anyone aware of any significant releases by Google?
-
Quite recent. We were actually seeing a nice downward trend in the huge number of pages indexed and then the number tripled. Crazy is an understatement. I would have thought the number of pages would fall given the number of pages that now use canonicals.
-
How long have you waited since you applied all the rules to avoid duplicate content, as if it was just recently, then Google should be "rebuilding" the index of your site and stats may be a little crazy while that is happening.
If it was over 2 month ago and you are seeing the increase now, then I'd suggest you revise the rules you created to see if your own Website isn't creating all those new pages.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Customer Reviews on Product Page / Pagination / Crawl 3 review pages only
Hi experts, I present customer feedback, reviews basically, on my website for the products that are sold. And with this comes the ability to read reviews and obviously with pagination to display the available reviews. Now I want users to be able to flick through and read the reviews to help them satisfy whatever curiosity they have. My only thinking is that the page that contains the reviews, with each click of the pagination will present roughly the same content. The only thing that changes is the title tags which will contain the number in the H1 to display the page number. I'm thinking this could be duplication but i have yet to be notified by Google in my Search console... Should i block crawlers from crawling beyond page 3 of reviews? Thanks
Technical SEO | | Train4Academy.co.uk0 -
Use Internal Search pages as Landing Pages?
Hi all Just a general discussion question about Internal Search pages and using them for SEO. I've been looking to "noindexing / follow" them, but a lot of the Search pages are actually driving significant traffic & revenue. I've over 9,000 search pages indexed that I was going to remove, but after reading this article (https://www.oncrawl.com/technical-seo/seo-internal-search-results/) I was wondering if any of you guys have had success using these pages for SEO, like with using auto-generated content. Or any success stories about using the "noindexing / follow"" too. Thanks!
Technical SEO | | Frankie-BTDublin0 -
Can a page that's 301 redirected get indexed / show in search results?
Hey folks, have searched around and haven't been able to find an answer to this question. I've got a client who has very different search results when including his middle initial. His bio page on his company's website has the slug /people/john-smith; I'm wondering if we set up a duplicate bio page with his middle initial (e.g. /people/john-b-smith) and then 301 redirect it to the existent bio page, whether the latter page would get indexed by google and show in search results for queries that use the middle initial (e.g. "john b smith"). I've already got the metadata based on the middle initial version but I know the slug is a ranking signal and since it's a direct match to one of his higher volume branded queries I thought it might help to get his bio page ranking more highly. Would that work or does the 301'd page effectively cease to exist in Google's eyes?
Technical SEO | | Greentarget0 -
How to de-index a page with a search string with the structure domain.com/?"spam"
The site in question was hacked years ago. All the security scans come up clean but the seo crawlers like semrush and ahrefs still show it as an indexed page. I can even click through on it and it takes me to the homepage with no 301. Where is the page and how to deindex it? domain/com/?spam There are multiple instances of this. http://www.clipular.com/c/5579083284217856.png?k=Q173VG9pkRrxBl0b5prNqIozPZI
Technical SEO | | Miamirealestatetrendsguy1 -
Alternatives 301? Issues redirection of index.html page with Adobe Business Catalyst
Hi Moz community, As for now we have two different versions of a client's homepage that’s dividing our traffic. One of the urls is the index.html version of the other url. We are using Adobe Business Catalyst for one of our clients and they told us they can’t 301 redirect. Adobe Business Catalyst does 301 redirects, but not to itself like an .htaccess rewrite. Doing a 301 redirect using BC from index.html to / creates an infinite loop and break the page. Are there alternatives to a 301 or any suggestions how to solve this? Thanks for all your answers and thoughts in advance,
Technical SEO | | Anna_Hoesl
Anna0 -
The number of pages indexed on Bing DROPPED significantly.
I haven't signed in to bing webmaster tool for a while. and I found that Bing is not indexing my site properly all of a sudden. IT DROPPED SIGNIFICANTLY Any idea why it is behaving this way? (please check the attachment) INg1o.png
Technical SEO | | joony20080 -
What happens to content under a category page that is not indexed?
We are reevaluating our URL structure. We have a flat architecture but would like to add subfolders per recommendations here and elsewhere. Some of our category pages are ad heavy/content light so we have them no indexed. We do have lots of quality content on the site that we would like to put under some of these keyword categories. Should we leave it flat? If Google does not see that category page then there will be no link from the homepage to the content page? Now: homepage/content-page Proposed: homepage/category/content-page (category is not indexed)
Technical SEO | | hoch0 -
Cache my page
So I need to get this page cached: http://www.flowerpetal.com/index.jsp?info=13 It's been 4-5 months since uploaded. Now it's linked to from the homepage of a PR5 site. I've tweeted that link 10 times, facebooked, stumbled, linked to it from other articles and still nothing. And I submitted the url to google twice. Any thoughts? Thanks Tyler
Technical SEO | | tylerfraser0