Role of Robots.txt and Search Console parameters settings
-
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
-
Thank you! That helps a lot.
-
So, regarding NOINDEX vs. DISALLOW, there is a significant difference there.
If you disallow in robots, you are asking the search engine to not even crawl that page. Whereas if you NOINDEX in the page head, then the search engine may still crawl the page but should not index it.
There are a few impacts of this difference. For one, if you use NOINDEX but still allow the search engine to FOLLOW, then it may discover pages which otherwise might not have been discovered (if that page has unique links, for example). So in this case, you might prefer to use (NOINDEX, FOLLOW) if you want that discovery to happen. On the other hand, if you have many pages and you are trying to wisely use the search engine's crawl "budget", then you might in some cases prefer to disallow some paths in the robots.txt file.
It's also common to use robots.txt to disallow some files where you do not have control over the response. Non-html files, where you might not be able to easily administer noindex directives. Or dynamic pages your web application may serve but not allow you to administer head tags for.
All of that said, robots.txt files have been shrinking ever since the search engines began to render javascript, since now they need access to a lot of resource files which they previously did not. Much of the old advice of disallowing scripts and admin folder paths may be obsolete now, if those files are needed to properly render pages.
-
Thanks so much for the reply. I am still struggling to understand when it's best to use robots.txt
I think I understand that url parameters are best handled in the search console parameters tool, and if you want to keep a page out of the index, it's best to use meta noindex rather than blocking it in robots.txt
What would be an example of when you would want to disallow something in robots.txt?
-
For one, the GSC functionality is much easier to use for dealing with URLs having multiple query string parameters. robots.txt processes the statements in order, so you often have to set up a broad disallow, followed by more specific allows, to achieve the same result which can be more easily managed in GSC.
Also, GSC is useful for the "representative URL" setting, if your pages don't necessarily get crawled without the parameter present at all, but you only want one version of the page indexed if the crawler encounters multiple versions. So, this is a little like a dynamic canonical, except you are not specifying which version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should we set up redirects for all deleted TAGS?
We recently found our site had 65,000 tags (yes 65K). In an effort to consolidate these we've started deleting them. MOZ is now reporting a heap of 404 errors for tag pages. These tag pages should not have links to them so not sure how come they're being crawled. Any suggestions from experience in this area would be useful.
Technical SEO | | wearehappymedia0 -
Australian search - ZERO visibility and stumped
Fair warning, this is going to be long, but necessary to explain the situation and what has been done. I will take ANY suggestions, even if I have tried them already. We have a sister site in Australia, targeting Australian traffic. I have inherited what seems to be an incredible rat's nest. I've fixed over two dozen issues, but still haven't seemed to address the root cause. NOTE: Core landing pages have weak keyword targeting. I don't expect much here until I fix this. The main issues I'm trying to resolve first are with the unusual US-based targeting, and the inability of the homepage to rank for anything. The site is www[dot]castleford[dot]com[dot]au. Here's the rundown on what's going on: Problems: The site ranks for four times as many keywords in the US as it does in Australia. The site ranks for a grand total of 5 keywords on the first page for AU keywords. The homepage, while technically optimized on-page for "content marketing agency", and with content through MarketMuse, has historically ranked between 60-100, despite having a fairly strong DA with fairly weak competitors, based on AHREFs keyword difficulty, and Moz keyword difficulty. Oddly, the ranking has gone up to 5-7 for three day spurts over the past year. Infrequent indexing of homepage (used to be every 2-3 weeks, I've gotten that down to 1 week). Sequence of events: November 2017 - they made some changes to their URLs - some on the blog and some on the top nav LPs. Redirects seem okay. November 2017 - Substantial number of lost referring domains, not many seem to be quality. January 2018 - total number of AU ranking keywords more than halved. May/June 2018 - added a follow inbound link sitewide to an external site that they created. 20k inbound links with same anchor text to homepage. Site has a total of 24k inbound links. July-Sep 2018 - total number of US ranking keywords halved November 10 - I walked into this mess. What's been done: Reduced site load speed by over 150% (it was around 20 seconds). Create sitemap (100 entry batching) and submit to GSC. Improved MarketMuse score for the homepage. Changed language from "en-US" to "en-AU" Fetch and render - content is all crawlable and indexed properly. Changed site architecture for top nav core landing pages to establish clear hierarchy. All version of GSC created, non-www and www http, and non www https and www https Site crawl - normal amount of 404s, nothing stands out as substantial. http to https redirect okay. Robots.txt updated and okay. Checked GSC international targeting, confirmed AU. No manual links penalty I'm clearly stumped and could use some insights. Thanks to everyone in advance, if you can find time.
Technical SEO | | Brafton-Marketing0 -
Log in, sign up, user registration and robots
Hi all, We have an accommodation site that asks users only to register when they want to book a room, in the last step. Though this is the ideal situation when you have tons of users, nowadays we are having around 1500 - 2000 per day and making tests we found out that if we ask for a registration (simple, 1 click FB) we mail them all and through a good customer service we are increasing our sales. That is why, we would like to ask users to register right after the home page ie Home/accommodation or and all the rest. I am not sure how can I make to make that content still visible to robots.
Technical SEO | | Eurasmus.com
Will the authentication process block google crawling it? Maybe something we can do? We are not completely sure how to proceed so any tip would be appreciated. Thank you all for answering.3 -
Google Enterprise Search Questions
Hi Everybody, A client has asked me to take a look at Google Enterprise Search for them. It has been a few years since I last fooled around with implementing a Google search box on a website, and that was the free version which included off-site results in the results. This appears to be the main page describing the paid product: http://www.google.com/enterprise/search/ I have three questions: The search testing function on the above page doesn't seem to be working. I'm typing in a URL and search term, as prompted, and the page is simply refreshing. It never provides me an example set of results. Is it working for you? This client has a moderately large e-commerce site (about 200 products). Have you implemented Google enterprise search on such a site and are you happy with its performance? The goal here is to let users search for a topic and be returned both product and informational pages. How well does this tool do this? Am I going to need to know any special types of coding (beyond html/css) to implement this? If so, what are they? If you have experience with this product, I would surely appreciate your feedback. Thank you!
Technical SEO | | MiriamEllis0 -
Blocking Affiliate Links via robots.txt
Hi, I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code: User-agent: * Disallow: / We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark
Technical SEO | | Mark_Ginsberg0 -
Content Duplication and Canonical Tag settings
Hi all, I have a question regarding content duplication.My site has posted one fresh content in the article section and set canonical in the same page for avoiding content duplication._But another webmaster has taken my post and posted the same in his site with canonical as his site url. They have not given to original source as well._May I know how Google will consider these two pages. Which site will be affected with content duplication by Google and how can I solve this issue?If two sites put canonical tags in there own pages for the same content how the search engine will find the original site which posted fresh content. How can we avoid content duplication in this case?
Technical SEO | | zco_seo0 -
Set base-href to subfolders - problems?
A customer is using the <base>-tag in an odd way: <base href="http://domain.com/1.0.0/1/1/"> My own theory is that the subfolders are added as the root because of revision control. CSS, images and internal links are used like this:
Technical SEO | | Vivamedia
internal link I ran a test with Xenu Link Sleuth and found many broken links on the site, but I can't say if it is due to the base-tag. I have read that the base-tag may cause problems in some browsers, but is this usage of base-tag bad in some SEO-perspective? I have a lot of problems with this customer and I want to know if the base-tag is a part of it.0 -
Targeting US search traffic
Hello, I've noticed the site I'm working on gets about 30-40% of Google organic search traffic from the US and the rest comes from around the world. All the site's customers are in the US and so the thought is to focus getting traffic more from the US. I know google webmaster tools has a geo targeting mechanism for the site in question but what I don't want to do is turn that on and then traffic from non-US sources goes away; I suppose that's not so bad if traffic from the US bumps up accordingly. Do you have any experience on this area? thanks -Mike
Technical SEO | | mattmainpath0