Problems with to many indexed pages
-
A client of our have not been able to rank very well the last few years. They are a big brand in our country, have more than 100+ offline stores and have plenty of inbound links.
Our main issue has been that they have to many indexed pages. Before we started we they had around 750.000 pages in the Google index. After a bit of work we got it down to 400-450.000. During our latest push we used the robots meta tag with "noindex, nofollow" on all pages we wanted to get out of the index, along with canonical to correct URL - nothing was done to robots.txt to block the crawlers from entering the pages we wanted out.
Our aim is to get it down to roughly 5000+ pages. They just passed 5000 products + 100 categories.
I added this about 10 days ago, but nothing has happened yet. Is there anything I can to do speed up the process of getting all the pages out of index?
The page is vita.no if you want to have a look!
-
Great! Please let us know how it goes so we can all learn more about it.
Thanks!
-
Thanks for that! What you are saying makes sense, so I'm going to go ahead and give it a try.
-
"Google: Do Not No Index Pages With Rel Canonical Tags"
https://www.seroundtable.com/noindex-canonical-google-18274.htmlThis is still being debated by people and I'm not saying it is "definitely" your problem. But if you're trying to figure out why those noindexed pages aren't coming out of the index this could be one thing to look into.
John Mueller (see screenshot below) is a Webmaster Trends Analyst for Google.
Good luck.
-
Isn't the whole point of using canonical to give Google a pointer of what page it is originally meant to be?
So if you have a category on shop.com/sub..
Using filter and/or pagenation you then get:
shop.com/sub?p=1
shop.com/sub?color=blue.. and so on! Both those pages then need canonical and neither do we want them index, so we by using both canonical and noindex tell Google to "don't index this page (noindex), here is the original version of it (canonical)".
Or did I misunderstand something?
-
Hello Inevo,
Most of the time when this happens it's just because Google hasn't gotten around to recrawling the pages and updating their index after seeing the new robots meta tag. It can take several months for this to happen on a large site. Submit an XML sitemap and/or create an HTML sitemap that makes it easy for them to get to these pages if you need it to go faster.
I had a look and see some conflicting instructions that Google could possibly be having a problem with.
The paginated version ( e.g. http://www.vita.no/duft?p=2 ) of the page has a rel canonical tag pointing to the first page (e.g. http://www.vita.no/duft/ ). Yet it also has a noindex tag while the canonical page has an index tag. And each page has its own unique title (Side 2 ... Side 3 | ...) . I would remove the rel canonical tag on the paginated pages since they probably don't have any pagerank worth giving to the canonical page. This way it is even more clear to Google that the canonical page is to be indexed, and the others are not to be - instead of saying they are the same page. The same is true of filter pages: http://www.vita.no/gavesett/herre/filter/price-400-/ .
I don't know if that has anything to do with your issue of index bloat, but it's worth a try. I did find some paginated pages in the index.
There also appears to be about 520 blog tag pages indexed. I typically set those to be noindex,follow.
Also remove all paginated pages and any other page that you don't want indexed from your XML sitemaps if you haven't already.
At least for the filter pages, since /filter/ is its own directory, you can use the URL removal tool in GWT. It does have a directory-level removal feature. Of course there are only 75 of these indexed at this moment.
-
My advice would be to include a fresh sitemap and upload it Google Webmaster tool. Not sure about time but I will second Donna, this will take time for the pages to get out of the Google Index.
There is one hack that I used for one page on my website but not sure if it will work for 1000+ pages.
I actually removed a page on my website using Google’s temporary removal request. It kicked the page out of the index for 90 days and in the mean time I added the link in the robots.txt file so it gone quickly and never returned back in the Google listing.
Hope this helps.
-
Hi lnevo,
I had a similar situation last year and am not aware of a faster way to get pages deindexed. You're feeding WMT an updated sitemap right?
It took 8 months for the excess pages to get dropped off my client's site. I'll be listening to hear if anyone knows a faster way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Very wierd pages. 2900 403 errors in page crawl for a site that only has 140 pages.
Hi there, I just made a crawl of the website of one of my clients with the crawl tool from moz. I have 2900 403 errors and there is only 140 pages on the website. I will give an exemple of what the crawl error gives me. | http://www.mysite.com/en/www.mysite.com/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | | | | | | | | | | There are 2900 pages like this. I have tried visiting the pages and they work, but they are only html pages without CSS. Can you guys help me to see what the problems is. We have experienced huge drops in traffic since Septembre.
Technical SEO | | H.M.N.0 -
Number of index pages in web master is different from site:mydomainname
Google says one to discover whether my pages is index in Google is site:domain name of my website: https://support.google.com/webmasters/answer/34444?hl=enas mention in web page above so basically according to that i can know totally pages indexed for my website right:it shows me when type (site:domain name ) 300 but it says in Google web master that i have 100000so which is the real number of index page 300 or 1000000 as web master says and why i get 300 when using site:domain name even Google mention that it is way to discover index paged
Technical SEO | | Jamalon0 -
Why wont google Index this page?
A week ago i accidentally changed this page settings in my CMS to "disable & dont index" as i was going to replace this page with another, but this didnt happen, but i forgot to switch the settings back! http://www.over50choices.co.uk/funeral-planning/funeral-plans Anyhow in an effort to get it back up quickly i submitted in GWTs but its still not indexed. When i use several SEO on page checking tools it has the Meta Title data as "Form" and not the correct title. Any ideas please? Yours frustrated Ash
Technical SEO | | AshShep10 -
Too many on-page links vs. UX issue
I am having an issue with many of our pages having too many on-page links. I have gotten many of them below the 100 page limit that is suggested and I understand this is not a critical factor with SEO, but my issue is this: Many important pages I am trying to optimize are buried at a "3rd" level which is actually not accessible from the home page navigation dropdown due to our outdated CMS. I am trying to decide if we should develop our site to display these pages on-hover from the main navigation. This would make a lot of sense since users would find these pages easier, however adding this functionality would increase on-page links by a lot more. So in your opinion, would it be worth it to spend the money to have this functionality developed? Or would it end up hurting our SEO standings?
Technical SEO | | isret_efront0 -
41.000 pages indexed two years after it was redirected to a new domain
Hi!Two years ago, we changed the domain elmundodportivo.es to mundodeportivo.com. Apparently, everything was OK, but more than two years later, there are still 41.000 pages indexed in Google (https://www.google.com/search?q=site%3Aelmundodeportivo.es) even though all the domains have been redirected with a 301 redirect. I detected some problems with redirections that were 303 instead of 301, but we fixed that one month ago.A secondary problem is that the pagerank for elmundodportivo.es is 7 yet and mundodeportivo.com is 3.What I'm doing wrong?Thank you all,Oriol
Technical SEO | | MundoDeportivo0 -
Why is the Page Authority of my product pages so low?
My domain authority is 35 (homepage Page Authority = 45) and my website has been up for years: www.rainchainsdirect.com Most random pages on my site (like this one) have a Page Authority of around 20. However, as a whole, the individual pages of my products rank exceptionally low. Like these: http://www.rainchainsdirect.com/products/copper-channel-link-rain-chain (Page Authority = 1) http://www.rainchainsdirect.com/collections/todays-deals/products/contempo-chain (Page Authority = 1) I was thinking that for whatever reason they have such low authority, that it may explain why these pages rank lower in google for specific searches using my exact product name (in other words, other sites that are piggybacking of my unique products are ranking higher for my product in a specific name search than the original product itself on my site) In any event, I'm trying to get some perspective on why these pages remain with the same non-existent Page Authority. Can anyone help to shed some light on why and what can be done about it? Thanks!
Technical SEO | | csblev0 -
.co.uk/index.html or just .co.uk - my on-page reports are different for both - why?
It looks like the same thing, yet it has a different on-page report for each version - why is this. Please share your ideas with me on this. The original url is http://bath.waspkilluk.co.uk/index.html. Many Thanks - Simon.
Technical SEO | | simonberenyi0 -
GWT indexing wrong pages
Hi SEOMoz I have a listings site. In a part of the page, I have 3 comboboxes, for state, county and city. On the change event, the javascript redirects the user to the page of the selected location. Parameters are passed via GET, and my URL is rewrited via htaccess. Example: http:///www.site.com/state/county/city.html The problem is, there is A LOT(more than 10k) of 404 errors. It is happenning because the crawler is trying to index the pages, sometimes WITHOUT a parameter, like http:///www.site.com/state//city.html I don't know how to stop it, and I don't wanna remove it, once it's very clicked by the users. What should I do?
Technical SEO | | elias990