To noindex or not to noindex
-
Our website lets users test whether any given URL or keyword is censored in China. For each URL and keyword that a user looks up, a page is created, such as https://en.greatfire.org/facebook.com and https://zh.greatfire.org/keyword/freenet. From a search engines perspective, all these pages look very similar. For this reason we have implemented a noindex function based on certain rules. Basically, only highly ranked websites are allowed to be indexed - all other URLs are tagged as noindex (for example https://en.greatfire.org/www.imdb.com). However, we are not sure that this is a good strategy and so are asking - what should a website with a lot of similar content do?
- Don't noindex anything - let Google decide what's worth indexing and not.
- Noindex most content, but allow some popular pages to be indexed. This is our current approach. If you recommend this one, we would like to know what we can do to improve it.
- Noindex all the similar content. In our case, only let overview pages, blog posts etc with unique content to be indexed.
Another factor in our case is that our website is multilingual. All pages are available (and equally indexed) in Chinese and English. Should that affect our strategy?References:https://zh.greatfire.orghttps://en.greatfire.orghttps://www.google.com/search?q=site%3Agreatfire.org
-
1. yes - if you no index all but 20 pages, those 20 pages would get a boost in rankings. You would end up losing the long tail searches from those other thousands of page - so you'll need to do some cost / benefit analysis on that.
2. you'll need to do a cost / benefit analysis on this one. Are most of the visitors to your site searching in Chinese or English? Are your search terms mainly in Chinese or mainly in English? Are your Chinese speaking visitors more likely to want to visit the .zh subdomain?
You could publish 20 to 50 pages on each subdomain, and then focus on doing some link building. If you have strong rankings across those 40 to 100 pages, then you could start adding more pages slowly over time.
-
Nops, no need to include the no index tag as adding canonical is an indication to Google that what are the original pages that search engine need to index and crawl so al other pages then category pages will be crawled automatically.
-
Hi Moosa. Thanks very much for your reply and great suggestions. If I add canonical tags on each URL page referencing the category page where it belongs, should I also add noindex tags on it? Should then actually all URL pages have noindex tags and only allow category pages to be indexed?
-
Thanks for suggestions. I have some follow-up questions. Would really like to know what you think about the following:
- The "page rank will get shared to all of the pages that you have across your site". In general, does this mean that if I add noindex tags to all but a few pages, they will be ranked much higher? Currently thousands of pages are indexed. Is it correct to say that if only say 20 pages were indexed that would greatly improve their ranking?
- The zh and en versions of the website have different templates and most of the text content is also translated (with the main exception of old blog posts). We could add noindex on all of the zh website or all except the main pages. Would you recommend that?
-
Ok I might sound completely stupid here as I never come across this case before but here is my hypothesis….
While searching for a keyword or URL you another field (may be a checkbox) that represents the category of search.
So, ones the new URL will generate it will come under the specific category automatically.
Customize the category pages so that they look different from each other.
Index the category pages and add canonical tag on any new generated URL of the category page. For example if the new page generates like www.yourwebsite.com/movies/ice-age -3/ this page should have the canonical tag to http://www.yourwebsite.com/movies/
Why?
Creating category pages will allow you more unique pages to get indexed in SERPs without the duplicate content issue. Adding canonical tag on all other URLs will tell category pages are the real pages that Google should consider.
This might help you cater more chances to earn more search traffic from Google.
**This is my assumption what I think should work!
-
Creating a page every time someone performs a search could probably spiral out of control pretty quickly. If you have a certain amount of 'page rank', based on all of the back links you have, that page rank will get shared to all of the pages that you have across your site.
One way you could more naturally control what gets indexed, is by what you link to from your home page. For instance, if you track the most blocked big sites, as well as the most blocked keywords, and have those pages 1 link from your homepage, you could expect those to get indexed naturally when your site is spidered.
As you get more links from other sites, and your trust from the search engines and page rank grows, you should be able to support more pages getting indexed across your site.
There is the issue of your site contents potentially being regarded as 'thin content', since many of the pages appear to be the same from page to page.
One question I had - I saw your site hosts both Chinese language words and English language words, and checks whether those words are being filtered. Perhaps it would make more sense to only show the words in Chinese characters on the zh. subdomain, and the English words on the en. subdomain? Just a thought. Is there any difference between the zh and en subdomains, aside from the language of the template?
Really interesting website.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should I noindex WooCommerce subcategories?
What's the best practice these days for handling indexing of WooCommerce product subcategories? Example: in the sitemap we have:
Intermediate & Advanced SEO | | btetrault
/product-category-a/
/product-category-a/subcategory-1/
/product-category-a/subcategory-2/
etc. Should the /subcategory-*/ be noindexed, canonical to parent, or stay as indexed? Thanks!2 -
How to get a large number of urls out of Google's Index when there are no pages to noindex tag?
Hi, I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down. The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag. I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler. Any ideas? Thanks! Best... Michael P.S., All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.
Intermediate & Advanced SEO | | 945010 -
What should I do after a failed request for validation (error with noindex, nofollow) in new Google Search Console?
Hi guys, We have the following situation: After an error message in new google search console for a large amount of pages with noindex, nofollow tag, a validation is requested before the problem is fixed. (it's incredibly stupid decision taken before asking the SEO team for advice) Google starts the validation, crawls 9 URLs and changes the status to "Failed". All other URLs are still in "pending" status. The problem has been fixed for more than 10 days, but apparently Google doesn't crawl the pages and none of the URLs is back in the index. We tried pinging several pages and html sitemaps, but there is no result. Do you think we should request for re-validation or wait more time? It there something more we could do to speed up the process?
Intermediate & Advanced SEO | | ParisChildress0 -
What's the best way to noindex pages but still keep backlinks equity?
Hello everyone, Maybe it is a stupid question, but I ask to the experts... What's the best way to noindex pages but still keep backlinks equity from those noindexed pages? For example, let's say I have many pages that look similar to a "main" page which I solely want to appear on Google, so I want to noindex all pages with the exception of that "main" page... but, what if I also want to transfer any possible link equity present on the noindexed pages to the main page? The only solution I have thought is to add a canonical tag pointing to the main page on those noindexed pages... but will that work or cause wreak havoc in some way?
Intermediate & Advanced SEO | | fablau3 -
Syndicated content with meta robots 'noindex, nofollow': safe?
Hello, I manage, with a dedicated team, the development of a big news portal, with thousands of unique articles. To expand our audiences, we syndicate content to a number of partner websites. They can publish some of our articles, as long as (1) they put a rel=canonical in their duplicated article, pointing to our original article OR (2) they put a meta robots 'noindex, follow' in their duplicated article + a dofollow link to our original article. A new prospect, to partner with with us, wants to follow a different path: republish the articles with a meta robots 'noindex, nofollow' in each duplicated article + a dofollow link to our original article. This is because he doesn't want to pass pagerank/link authority to our website (as it is not explicitly included in the contract). In terms of visibility we'd have some advantages with this partnership (even without link authority to our site) so I would accept. My question is: considering that the partner website is much authoritative than ours, could this approach damage in some way the ranking of our articles? I know that the duplicated articles published on the partner website wouldn't be indexed (because of the meta robots noindex, nofollow). But Google crawler could still reach them. And, since they have no rel=canonical and the link to our original article wouldn't be followed, I don't know if this may cause confusion about the original source of the articles. In your opinion, is this approach safe from an SEO point of view? Do we have to take some measures to protect our content? Hope I explained myself well, any help would be very appreciated, Thank you,
Intermediate & Advanced SEO | | Fabio80
Fab0 -
Can adding "noindex" help with quality penalizations?
Hello Moz fellows, I have another question about content quality and Panda related penalization. I was wondering this: If I have an entire section of my site that has been penalized due to thin content, can adding "noindex,follow" to all pages belonging to that section help de-penalizing the rest of the site in the short term, while we work to improve those penalized pages, which is going to take a long time? Can that be considered a "short term solution" to improve the overall site scoring on Google index while we work to improve those penalized pages, and, once ready, we remove the "noindex" tag? I am eager to know your thoughts on this possible strategy. Thank you in advance to everyone!
Intermediate & Advanced SEO | | fablau0 -
Noindex
I have been reading a lot of conflicting information on the Link Juice ramifications of using "NoIndex". Can I get some advice for the following situation? 1. I have pages that I do not want indexed on my site. They are lead conversion pages. Just about every page on my site has links to them. If I just apply a standard link, those pages will get a ton of Link Juice that I'd like to allocate to other pages. 2. If I use "nofollow", the pages won't rank, but the link juice evaporates. I get that. I won't use "nofollow" 3. I have read that "noindex, follow" will block the pages in the SERPs, but will pass Link Juice to them. I don't think that I want this either. If I "dead end" the lead form with no navigation or links, will the juice be locked up on the page? 4. I assume that I should block the pages in robots.txt In order to keep the pages out of the SERPs, and conserve Link Juice, what should I do? Can someone please give me a step by step process with the reasoning for what I should do here?
Intermediate & Advanced SEO | | CsmBill0 -
Noindex a meta refresh site
I have a client's site that is a vanity URL, i.e. www.example.com, that is setup as a meta refresh to the client's flagship site: www22.example.com, however we have been seeing Google include the Vanity URL in the index, in some cases ahead of the flagship site. What we'd like to do is to de-index that vanity URL. We have included a no-index meta tag to the vanity URL, however we noticed within 24 hours, actually less, the flagship site also went away as well. When we removed the noindex, both vanity and flagship sites came back. We noticed in Google Webmaster that the flagship site's robots.txt file was corrupt and was also in need of fixing, and we are in process of fixing that - Question: Is there a way to noindex vanity URL and NOT flagship site? Was it due to meta refresh redirect that the noindex moved out the flagship as well? Was it maybe due to my conducting a google fetch and then submitting the flagship home page that the site reappeared? The robots.txt is still not corrected, so we don't believe that's tied in here. To add to the additional complexity, the client is UNABLE to employ a 301 redirect, which was what I recommended initially. Anyone have any thoughts at all, MUCH appreciated!
Intermediate & Advanced SEO | | ACNINTERACTIVE0