Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I use meta noindex and robots.txt disallow?
-
Hi, we have an alternate "list view" version of every one of our search results pages
The list view has its own URL, indicated by a URL parameter
I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling
When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled
Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...
Thanks
-
Hi,
Thanks, I will do some testing to confirm that this behaves how I would like it to
-
if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.
So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.
However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.
I hope that answers all your questions?
-
When you say:
nofollow will tell the crawlers to not crawl the page
I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?
But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).
It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?
-
noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.
robots.txt will accomplish this as well but both I think would be overkill.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using hreflang for international pages - is this how you do it?
My client is trying to achieve a global presence in select countries, and then track traffic from their international pages in Google Analytics. The content for the international pages is pretty much the same as for USA pages, but the form and a few other details are different due to how product licensing has to be set up. I don’t want to risk losing ranking for existing USA pages due to issues like duplicate content etc. What is the best way to approach this? This is my first foray into this and I’ve been scanning the MOZ topics but a number of the conversations are going over my head,so suggestions will need to be pretty simple 🙂 Is it a case of adding hreflang code to each page and creating different URLs for tracking. For example:
Intermediate & Advanced SEO | | Caro-O
URL for USA: https://company.com/en-US/products/product-name/
URL for Canada: https://company.com/en-ca/products/product-name /
URL for German Language Content: https://company.com/de/products/product-name /
URL for rest of the world: https://company.com/en/products/product-name /1 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
What do you add to your robots.txt on your ecommerce sites?
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Should I disallow all URL query strings/parameters in Robots.txt?
Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM
Intermediate & Advanced SEO | | jmorehouse
/Mulligan-Practitioner-CD-ROM?cat_id=87
/Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
All In One SEO PACK Configuration - Index or Noindex?
I'm finding conflicting information about the right way to configure the All in One SEO Pack wordpress plugin. Do I index or noindex for the items below? Use noindex for Categories - yes or no? Use noindex for Archives - yes or no? Use noindex for Tag Archives - yes or no?
Intermediate & Advanced SEO | | webestate0 -
Soft 404's from pages blocked by robots.txt -- cause for concern?
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages). Should we be concerned? Is there anything we can do about this?
Intermediate & Advanced SEO | | nicole.healthline4 -
Noindex,follow is a waste of link juice?
On my wordpress shopping cart plugin, I have three pages /account, /checkout and /terms on which I have added “noindex,follow” attribute. But I think I may be wasting link juice on these pages as they are not to be indexed anyway, so is there any point giving them any link juice? I can add “noindex,nofollow” on to the page itself. However, the actual text/anchor link to these pages on the site header will remain “follow” as I have no means of amending that right now. So this presents the following two scenarios – No juice flows from homepage to these 3 pages (GOOD) – This would be perfect then, as the pages themselves have nofollow attribute. Juice flows from homepage to these pages (BAD) - This may mean that the juice flows from homepage anchor text links to these 3 pages BUT then STOPS there as they have “nofollow” attribute on that page. This will be a bigger problem and if this is the case and I cant stop the juice from flowing in, then ill rather let it flow out to other pages. Hope you understand my question, any input is very much appreciated. Thanks
Intermediate & Advanced SEO | | SamBuck1