Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I use meta noindex and robots.txt disallow?
-
Hi, we have an alternate "list view" version of every one of our search results pages
The list view has its own URL, indicated by a URL parameter
I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling
When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled
Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...
Thanks
-
Hi,
Thanks, I will do some testing to confirm that this behaves how I would like it to
-
if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.
So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.
However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.
I hope that answers all your questions?
-
When you say:
nofollow will tell the crawlers to not crawl the page
I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?
But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).
It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?
-
noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.
robots.txt will accomplish this as well but both I think would be overkill.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How many images should I use in structured data for a product?
We have a basic printing website that offers business cards. Each type of business card has a few product images. Should we use structured data for all the images, or just the main image? What is your opinion about this? Thanks in advance.
Intermediate & Advanced SEO | | Choice0 -
Why is our noindex tag not working?
Hi, I have the following page where we've implemented a no index tag. But when we run this page in screaming frog or this tool here to verify the noidex is present and functioning, it shows that it's not. But if you view the source of the page, the code is present in the head tag. And unfortunately we've seen instances where Google is indexing pages we've noindexed. Any thoughts on the example above or why this is happening in Google? Eddy
Intermediate & Advanced SEO | | eddys_kap0 -
Are rel=author and rel=publisher meta tags currently in use?
Hello, Do these meta tags have any current usage? <meta name="author" content="Author Name"><meta name="publisher" content="Publisher Name"> I have also seen this usage linking to a companies Google+ Page:Thank you
Intermediate & Advanced SEO | | srbello0 -
Are ALL CAPS construed as spamming if they are used in a meta description tag call to action?
I know this seems like an old school question. As a long time SEO I would never use ALL CAPS in a title tag (unless a brand name is capitalized). However I recently came across a Moz video about creating better calls to action in the meta description tags. Some of the examples had CTAs that were using all caps (i.e. CALL NOW! or LOWEST QUOTES!) I realize there is a debate about the user experience implications. However I'm more concerned about search engines penalizing websites that are using ALL CAPS CTAs in their meta description tags. Any feedback/advice would be appreciated. Thanks
Intermediate & Advanced SEO | | RosemaryB0 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Should I Keep adding 301s or use a noindex,follow/canonical or a 404 in this situation?
Hi Mozzers, I feel I am facing a double edge sword situation. I am in the process of migrating 4 domains into one. I am in the process of creating URL redirect mapping The pages I am having the most issues are the event pages that are past due but carry some value as they generally have one external followed link. www.example.com/event-2008 301 redirect to www.newdomain.com/event-2016 www.example.com/event-2007 301 redirect to www.newdomain.com/event-2016 www.example.com/event-2006 301 redirect to www.newdomain.com/event-2016 Again these old events aren't necessarily important in terms of link equity but do carry some and at the same time keep adding multiple 301s pointing to the same page may not be a good ideas as it will increase the page speed load time which will affect the new site's performance. If i add a 404 I will lose the bit of equity in those. No index,follow may work since it won't index the old domain nor the page itself but still not 100% sure about it. I am not sure how a canonical would work since it would keep the old domain live. At this point I am not sure which direction I should follow? Thanks for your answers!
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
Block an entire subdomain with robots.txt?
Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
Intermediate & Advanced SEO | | kylesuss12