Huge number of indexed pages with no content
-
Hi,
We have accidentally had Google indexed lots os our pages with no useful content at all on them.
The site in question is a directory site, where we have tags and we have cities. Some cities have suppliers for almost all the tags, but there are lots of cities, where we have suppliers for only a handful of tags.
The problem occured, when we created a page for each cities, where we list the tags as links.
Unfortunately, our programmer listed all the tags, so not only the ones, where we have businesses, offering their services, but all of them!
We have 3,142 cities and 542 tags. I guess, that you can imagine the problem this caused!
Now I know, that Google might simply ignore these empty pages and not crawl them again, but when I check a city (city site:domain) with only 40 providers, I still have 1,050 pages indexed. (Yes, we have some issues between the 550 and the 1050 as well, but first things first:))
These pages might not be crawled again, but will be clicked, and bounces and the whole user experience in itself will be terrible.
My idea is, that I might use meta noindex for all of these empty pages and perhaps also have a 301 redirect from all the empty category pages, directly to the main page of the given city.
Can this work the way I imagine? Any better solution to cut this really bad nightmare short?
Thank you in advance.
Andras
-
Thank you again, John. I will fix this, based on our discussion.
-
NoIndex I think is slightly superfluous as the 301 will take care of it and also point people to a proper result and give Google a redirected result.
However SEOMoz's Robots information page page suggests:
"In most cases, meta robots with parameters
"noindex, follow"
should be employed as a way to to restrict crawling or indexation."- So maybe consider that...
As for Robots, you can check out SEOMoz's Robots information page where it has information on wildcards, which you could use, which I THINK would work (i.e. http://domain.com/*/tags ?
Not quite sure on that last bit though...
-
Thank you for your reply, Josh.
I will then use the 301, but should I also use the noindex tag for these pages to be removed from the index?
Does it make an emphasis on my intention, or it adds no extra to the process? Perhaps, they should not be used together at all, as basically they are meant for different tasks.
(Unfortunatyly, robots.txt is not really a solution, as we have the following url structure:
Since all the cities have at least a couple of valid tags, I can't specify the path to be excluded from indexing. I would also try not to add 2,000+ cities individually.
As for GWT, url removal for this number of pages might also not be an option, as I have minimum 100,000+ no-value pages to be removed (the limit is 500 per month).)
-
I would agree, just setup a 301 redirect so that users don't bounce and actually get directed to something remotely useful, even just a listing of all the tags around the site or a home page or something (even if you do the below, to ensure users who stumble on these pages are still happy).
You could also use a robots.txt file to show which ones you don't want to be indexed, and finally you may also use Google's Webmaster Tools to manually remove particular pages!
A combo of all of those will work a treat!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New Pages in my Shopify website is not indexing
Hi The Service area pages created on my Shopify website is not indexing on google for a long time, Tried indexing the pages manually and also submitted the sitemap but still the pages doesn't seem to get indexed.
Technical SEO | | Bhisshaun
Thanks in Advance.0 -
How to check if an individual page is indexed by Google?
So my understanding is that you can use site: [page url without http] to check if a page is indexed by Google, is this 100% reliable though? Just recently Ive worked on a few pages that have not shown up when Ive checked them using site: but they do show up when using info: and also show their cached versions, also the rest of the site and pages above it (the url I was checking was quite deep) are indexed just fine. What does this mean? thank you p.s I do not have WMT or GA access for these sites
Technical SEO | | linklander0 -
Need Help On Proper Steps to Take To De-Index Our Search Results Pages
So, I have finally decided to remove our Search Results pages from Google. This is a big dealio, but our traffic has consistently been declining since 2012 and it's the only thing I can think of. So, the reason they got indexed is back in 2012, we put linked tags on our product pages, but they linked to our search results pages. So, over time we had hundreds of thousands of search results pages indexed. By tag pages I mean: Keywords: Kittens, Doggies, Monkeys, Dog-Monkeys, Kitten-Doggies Each of these would be linked to our search results pages, i.e. http://oursite.com/Search.html?text=Kitten-Doggies So, I really think these pages being indexed are causing much of our traffic problems as there are many more Search Pages indexed than actual product pages. So, my question is... Should I go ahead and remove the links/tags on the product pages first? OR... If I remove those, will Google then not be able to re-crawl all of the search results pages that it has indexed? Or, if those links are gone will it notice that they are gone, and therefore remove the search results pages they were previously pointing to? So, Should I remove the links/tags from the product page (or at least decrease them down to the top 8 or so) as well as add the no-follow no-index to all the Search Results pages at the same time? OR, should I first no-index, no-follow ALL the search results pages and leave those tags on the product pages there to give Google a chance to go back and follow those tags to all of the Search Results pages so that it can get to all of those Search Results pages in order to noindex,. no follow them? Otherwise will Google not be able find these pages? Can someone comment on what might be the best, safest, or fastest route? Thanks so much for any help you might offer me!! Craig So, I wanted to see if you have a suggestion on the best way to handle it? Should I remove the links/tags from the product page (or at least decrease them down to the top 8 or so) as well as add the no-follow no-index to all the Search Results pages at the same time? OR, should I first no-index, no-follow ALL the search results pages and leave those tags on the product pages there to give Google a chance to go back and follow those tags to all of the Search Results pages so that it can get to all of those Search Results pages in order to noindex,. no follow them? Otherwise will Google not be able find these pages? Can you tell me which would be the best, fastest and safest routes?
Technical SEO | | TheCraig0 -
Advice on Duplicate Page Content
We have many pages on our website and they all have the same template (we use a CMS) and at the code level, they are 90% the same. But the page content, title, meta description, and image used are different for all of them. For example - http://www.jumpstart.com/common/find-easter-eggs
Technical SEO | | jsmoz
http://www.jumpstart.com/common/recognize-the-rs We have many such pages. Does Google look at them all as duplicate page content? If yes, how do we deal with this?0 -
Advice on improve this content page for seo and google
Hi, i use joomla and i am looking for some help to find out what i should be doing to make my content pages better for seo and google. I would be grateful if people would look at the following page as an example http://www.in2town.co.uk/trip-advisor/top-american-ski-resorts-for-over-50s and let me know what i should be doing to make it better for seo and for google so people can find the page. I am using the above page as an example so i can learn from it. I would be grateful if people could look at the source code for the page to see if there is anything that should be in their that is not and if i should be looking at any joomla plugins for the content pages to improve the seo of the page. Any help to improve my seo for my content pages would be great. many thanks
Technical SEO | | ClaireH-1848860 -
How do I fix this type of duplicate page content problem?
Sample URLs with this Duplicate Page Content URLs Internal Links External Links Page Authority Linking Root Domains http://rogerelkindlaw.com/index.html 30 0 26 1 http://www.rogerelkindlaw.com/index.html 30 0 20 1 http://www.rogerelkindlaw.com/ | 1,630 | 613 | 43 | 110 | As you can see there are three duplicate pages; http://rogerelkindlaw.com/index.html http://www.rogerelkindlaw.com/index.html http://www.rogerelkindlaw.com/ What would be the best and most efficient way to fix this problem and also how to prevent this from happening? Thank you.
Technical SEO | | brianhughes0 -
Is this 404 page indexed?
I have a URL that when searched for shows up in the Google index as the first result but does not have any title or description attached to it. When you click on the link it goes to a 404 page. Is it simply that Google is removing it from the index and is in some sort of transitional phase or could there be another reason.
Technical SEO | | bfinternet0 -
How do https pages affect indexing?
Our site involves e-commerce transactions that we want users to be able to complete via javascript popup/overlay boxes. in order to make the credit card form secure, we need the referring page to be secure, so we are considering making the entire site secure so all of our site links wiould be https. (PayPal works this way.) Do you think this will negatively impact whether Google and other search engines are able to index our pages?
Technical SEO | | seozeelot0