Problems with to many indexed pages
-
A client of our have not been able to rank very well the last few years. They are a big brand in our country, have more than 100+ offline stores and have plenty of inbound links.
Our main issue has been that they have to many indexed pages. Before we started we they had around 750.000 pages in the Google index. After a bit of work we got it down to 400-450.000. During our latest push we used the robots meta tag with "noindex, nofollow" on all pages we wanted to get out of the index, along with canonical to correct URL - nothing was done to robots.txt to block the crawlers from entering the pages we wanted out.
Our aim is to get it down to roughly 5000+ pages. They just passed 5000 products + 100 categories.
I added this about 10 days ago, but nothing has happened yet. Is there anything I can to do speed up the process of getting all the pages out of index?
The page is vita.no if you want to have a look!
-
Great! Please let us know how it goes so we can all learn more about it.
Thanks!
-
Thanks for that! What you are saying makes sense, so I'm going to go ahead and give it a try.
-
"Google: Do Not No Index Pages With Rel Canonical Tags"
https://www.seroundtable.com/noindex-canonical-google-18274.htmlThis is still being debated by people and I'm not saying it is "definitely" your problem. But if you're trying to figure out why those noindexed pages aren't coming out of the index this could be one thing to look into.
John Mueller (see screenshot below) is a Webmaster Trends Analyst for Google.
Good luck.
-
Isn't the whole point of using canonical to give Google a pointer of what page it is originally meant to be?
So if you have a category on shop.com/sub..
Using filter and/or pagenation you then get:
shop.com/sub?p=1
shop.com/sub?color=blue.. and so on! Both those pages then need canonical and neither do we want them index, so we by using both canonical and noindex tell Google to "don't index this page (noindex), here is the original version of it (canonical)".
Or did I misunderstand something?
-
Hello Inevo,
Most of the time when this happens it's just because Google hasn't gotten around to recrawling the pages and updating their index after seeing the new robots meta tag. It can take several months for this to happen on a large site. Submit an XML sitemap and/or create an HTML sitemap that makes it easy for them to get to these pages if you need it to go faster.
I had a look and see some conflicting instructions that Google could possibly be having a problem with.
The paginated version ( e.g. http://www.vita.no/duft?p=2 ) of the page has a rel canonical tag pointing to the first page (e.g. http://www.vita.no/duft/ ). Yet it also has a noindex tag while the canonical page has an index tag. And each page has its own unique title (Side 2 ... Side 3 | ...) . I would remove the rel canonical tag on the paginated pages since they probably don't have any pagerank worth giving to the canonical page. This way it is even more clear to Google that the canonical page is to be indexed, and the others are not to be - instead of saying they are the same page. The same is true of filter pages: http://www.vita.no/gavesett/herre/filter/price-400-/ .
I don't know if that has anything to do with your issue of index bloat, but it's worth a try. I did find some paginated pages in the index.
There also appears to be about 520 blog tag pages indexed. I typically set those to be noindex,follow.
Also remove all paginated pages and any other page that you don't want indexed from your XML sitemaps if you haven't already.
At least for the filter pages, since /filter/ is its own directory, you can use the URL removal tool in GWT. It does have a directory-level removal feature. Of course there are only 75 of these indexed at this moment.
-
My advice would be to include a fresh sitemap and upload it Google Webmaster tool. Not sure about time but I will second Donna, this will take time for the pages to get out of the Google Index.
There is one hack that I used for one page on my website but not sure if it will work for 1000+ pages.
I actually removed a page on my website using Google’s temporary removal request. It kicked the page out of the index for 90 days and in the mean time I added the link in the robots.txt file so it gone quickly and never returned back in the Google listing.
Hope this helps.
-
Hi lnevo,
I had a similar situation last year and am not aware of a faster way to get pages deindexed. You're feeding WMT an updated sitemap right?
It took 8 months for the excess pages to get dropped off my client's site. I'll be listening to hear if anyone knows a faster way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages are Indexed but not Cached by Google. Why?
Hello, We have magento 2 extensions website mageants.com since 1 years google every 15 days cached my all pages but suddenly last 15 days my websites pages not cached by google showing me 404 error so go search console check error but din't find any error so I have cached manually fetch and render but still most of pages have same 404 error example page : - https://www.mageants.com/free-gift-for-magento-2.html error :- http://webcache.googleusercontent.com/search?q=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&rlz=1C1CHBD_enIN803IN804&oq=cache%3Ahttps%3A%2F%2Fwww.mageants.com%2Ffree-gift-for-magento-2.html&aqs=chrome..69i57j69i58.1569j0j4&sourceid=chrome&ie=UTF-8 so have any one solutions for this issues
Technical SEO | | vikrantrathore0 -
Drop in Indexed Page + Organic Traffic
Hey Moz Community, I've been seeing a steady decrease in search console of pages being indexed by Google for our eCommerce site. This is corresponding to lower impressions and traffic in general this year. We started with around a million pages being indexed in Nov of 2015 down to 18,000 pages this Nov. I realized that since we don't have around 3,000 or so products year round this is mostly likely a good thing. I've checked to make sure our main landing pages are being indexed which they are and our sitemap was updated several times this year, although we're in the process of updating it again to resubmit. I also checked our robots.txt and there's nothing out of the ordinary. In the last month we've recently gotten rid of some duplicate content issues caused by pagination by using canonical tags but that's all we've done to reduce the number of pages crawled. We have seen some soft 404's and some server errors coming up in our crawl error report that we've either fixed or are trying to fix. Not really sure where to start looking to find a solution to the problem or if it's even a huge issue, but the drop in traffic is also not great. The drop in traffic corresponded to lose in rankings as well so there could be correlation or none. Any ideas here?
Technical SEO | | znotes0 -
23,000 pages indexed, I think bad
Thank you Thank you Moz People!! I have a successful vacation rental company that has terrible seo but getting better. When I first ran Moz crawler and page grader, I had 35,000 errors and all f's.... tons of problem with duplicate page content and titles because not being consistent with page names... mainly capitalization and also rel canonical errors... with that said, I have now maybe 2 or 3 errors from time to time, but I fix every other day. Problem Maybe My site map shows in Google Webmaster submitted 1155
Technical SEO | | nickcargill
1541 indexed But google crawl shows 23,000 pages probably because of duplicate errors or possibly database driven url parameters... How bad is this and how do I get this to be accurate, I have seen google remove tool but I do not think this is right? 2) I have hired a full time content writer and I hope this works My site in google was just domain.com but I had put a 301 in to www.domain.com becauses www. had a page authority where the domain.com did not. But in webmasters I had domain.com just listed. So I changed that to www.domain.com (as preferred domain name) and ask for the first time to crawl. www.domain.com . Anybody see any problems with this? THank you MOZ people, Nick0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Web page is showing up on Google but doesn't show when it was cached, so is it indexed?
Hey everyone So I created a new page on a WordPress website, it was live for a few hours till I changed my mind & switched it back to a draft. Just out of curiosity I did the Site:www.example.com/Example search on Google to see if it had been indexed & apparently it had but when I click on cached to see what time it got indexed at exactly it's showing me an error. So does this mean it is indexed or not?
Technical SEO | | conversiontactics0 -
Too Many On-Page Links?
How much would this affect my page ranks performance? There are many Too Many On-Page Links? warning on my campaign. should I address this issue right away to fix it or leave it as it would not matter seriously ? I've looked at some of the pages and think all of them are necessary. Could someone help me? Thanks!
Technical SEO | | LauraHT0 -
When Is It Good To Redirect Pages on Your Site to Another Page?
Suppose you have a page on your site that discusses a topic that is similar to another page but targets a different keyword phrase. The page has medium quality content, no inbound links, and the attracts little traffic. Should you 301 redirect the page to a stronger page?
Technical SEO | | ProjectLabs1 -
Problems with changing the page title on Wordpress Site...
The website is http://www.masterpieceinteriors.co.uk/ I'm not sure why the homepage title is reading 'masterpieces' in Google but looks fine here. There seem to be 2 versions of the homepage coming up in google - you can see them both by searching 'masterpieceinteriors' and then 'masterpiece interiors'. Some help would be hugely appreciated!
Technical SEO | | Opiyo0