Problems with to many indexed pages

Inevo

A client of our have not been able to rank very well the last few years. They are a big brand in our country, have more than 100+ offline stores and have plenty of inbound links.

Our main issue has been that they have to many indexed pages. Before we started we they had around 750.000 pages in the Google index. After a bit of work we got it down to 400-450.000. During our latest push we used the robots meta tag with "noindex, nofollow" on all pages we wanted to get out of the index, along with canonical to correct URL - nothing was done to robots.txt to block the crawlers from entering the pages we wanted out.

Our aim is to get it down to roughly 5000+ pages. They just passed 5000 products + 100 categories.

I added this about 10 days ago, but nothing has happened yet. Is there anything I can to do speed up the process of getting all the pages out of index?

The page is vita.no if you want to have a look!

Everett

Great! Please let us know how it goes so we can all learn more about it.

Thanks!

Inevo

Thanks for that! What you are saying makes sense, so I'm going to go ahead and give it a try.

Everett

"Google: Do Not No Index Pages With Rel Canonical Tags"
https://www.seroundtable.com/noindex-canonical-google-18274.html

https://productforums.google.com/forum/?hl=en#!category-topic/webmasters/crawling-indexing--ranking/0sqRrolO_Ss

This is still being debated by people and I'm not saying it is "definitely" your problem. But if you're trying to figure out why those noindexed pages aren't coming out of the index this could be one thing to look into.

John Mueller (see screenshot below) is a Webmaster Trends Analyst for Google.

Good luck.

Noindex-no-follow-rel-canonical-same-page-1395178226.png

Inevo

Isn't the whole point of using canonical to give Google a pointer of what page it is originally meant to be?

So if you have a category on shop.com/sub..

Using filter and/or pagenation you then get:

shop.com/sub?p=1
shop.com/sub?color=blue

.. and so on! Both those pages then need canonical and neither do we want them index, so we by using both canonical and noindex tell Google to "don't index this page (noindex), here is the original version of it (canonical)".

Or did I misunderstand something?

Everett

Hello Inevo,

Most of the time when this happens it's just because Google hasn't gotten around to recrawling the pages and updating their index after seeing the new robots meta tag. It can take several months for this to happen on a large site. Submit an XML sitemap and/or create an HTML sitemap that makes it easy for them to get to these pages if you need it to go faster.

I had a look and see some conflicting instructions that Google could possibly be having a problem with.

The paginated version ( e.g. http://www.vita.no/duft?p=2 ) of the page has a rel canonical tag pointing to the first page (e.g. http://www.vita.no/duft/ ). Yet it also has a noindex tag while the canonical page has an index tag. And each page has its own unique title (Side 2 ... Side 3 | ...) . I would remove the rel canonical tag on the paginated pages since they probably don't have any pagerank worth giving to the canonical page. This way it is even more clear to Google that the canonical page is to be indexed, and the others are not to be - instead of saying they are the same page. The same is true of filter pages: http://www.vita.no/gavesett/herre/filter/price-400-/ .

I don't know if that has anything to do with your issue of index bloat, but it's worth a try. I did find some paginated pages in the index.

There also appears to be about 520 blog tag pages indexed. I typically set those to be noindex,follow.

Also remove all paginated pages and any other page that you don't want indexed from your XML sitemaps if you haven't already.

At least for the filter pages, since /filter/ is its own directory, you can use the URL removal tool in GWT. It does have a directory-level removal feature. Of course there are only 75 of these indexed at this moment.

MoosaHemani

My advice would be to include a fresh sitemap and upload it Google Webmaster tool. Not sure about time but I will second Donna, this will take time for the pages to get out of the Google Index.

There is one hack that I used for one page on my website but not sure if it will work for 1000+ pages.

I actually removed a page on my website using Google’s temporary removal request. It kicked the page out of the index for 90 days and in the mean time I added the link in the robots.txt file so it gone quickly and never returned back in the Google listing.

Hope this helps.

DonnaDuncan

Hi lnevo,

I had a similar situation last year and am not aware of a faster way to get pages deindexed. You're feeding WMT an updated sitemap right?

It took 8 months for the excess pages to get dropped off my client's site. I'll be listening to hear if anyone knows a faster way.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Problems with to many indexed pages

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

I have two robots.txt pages for www and non-www version. Will that be a problem?

Purchased domain with links - redirect page by page or entire domain?

Is it good to redirect million of pages on a single page?

Investigating a huge spike in indexed pages

Pages to be indexed in Google

How should i knows google to indexed my new pages ?

Why is my office page not being indexed?

How to Find all the Pages Index by Google?