Role of Robots.txt and Search Console parameters settings
-
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
-
Thank you! That helps a lot.
-
So, regarding NOINDEX vs. DISALLOW, there is a significant difference there.
If you disallow in robots, you are asking the search engine to not even crawl that page. Whereas if you NOINDEX in the page head, then the search engine may still crawl the page but should not index it.
There are a few impacts of this difference. For one, if you use NOINDEX but still allow the search engine to FOLLOW, then it may discover pages which otherwise might not have been discovered (if that page has unique links, for example). So in this case, you might prefer to use (NOINDEX, FOLLOW) if you want that discovery to happen. On the other hand, if you have many pages and you are trying to wisely use the search engine's crawl "budget", then you might in some cases prefer to disallow some paths in the robots.txt file.
It's also common to use robots.txt to disallow some files where you do not have control over the response. Non-html files, where you might not be able to easily administer noindex directives. Or dynamic pages your web application may serve but not allow you to administer head tags for.
All of that said, robots.txt files have been shrinking ever since the search engines began to render javascript, since now they need access to a lot of resource files which they previously did not. Much of the old advice of disallowing scripts and admin folder paths may be obsolete now, if those files are needed to properly render pages.
-
Thanks so much for the reply. I am still struggling to understand when it's best to use robots.txt
I think I understand that url parameters are best handled in the search console parameters tool, and if you want to keep a page out of the index, it's best to use meta noindex rather than blocking it in robots.txt
What would be an example of when you would want to disallow something in robots.txt?
-
For one, the GSC functionality is much easier to use for dealing with URLs having multiple query string parameters. robots.txt processes the statements in order, so you often have to set up a broad disallow, followed by more specific allows, to achieve the same result which can be more easily managed in GSC.
Also, GSC is useful for the "representative URL" setting, if your pages don't necessarily get crawled without the parameter present at all, but you only want one version of the page indexed if the crawler encounters multiple versions. So, this is a little like a dynamic canonical, except you are not specifying which version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitelink search in google search for Brand name redirect me to 404, how?
Hi All, When I search my brand name in google and in google search result my site appears with sitelink and in site link there is option of search when I search any keyword in that search then that search redirect me to 404 page of my site. I found I have implemented wrong schema at category page for search action and then I fixed the bug but 5 days passed away still google showing 404 of my search action. I have not implemented schema for search action at homepage. Now please let me know what is the issue?
Technical SEO | | amu1230 -
Robots.txt blocking Addon Domains
I have this site as my primary domain: http://www.libertyresourcedirectory.com/ I don't want to give spiders access to the site at all so I tried to do a simple Disallow: / in the robots.txt. As a test I tried to crawl it with Screaming Frog afterwards and it didn't do anything. (Excellent.) However, there's a problem. In GWT, I got an alert that Google couldn't crawl ANY of my sites because of robots.txt issues. Changing the robots.txt on my primary domain, changed it for ALL my addon domains. (Ex. http://ethanglover.biz/ ) From a directory point of view, this makes sense, from a spider point of view, it doesn't. As a solution, I changed the robots.txt file back and added a robots meta tag to the primary domain. (noindex, nofollow). But this doesn't seem to be having any effect. As I understand it, the robots.txt takes priority. How can I separate all this out to allow domains to have different rules? I've tried uploading a separate robots.txt to the addon domain folders, but it's completely ignored. Even going to ethanglover.biz/robots.txt gave me the primary domain version of the file. (SERIOUSLY! I've tested this 100 times in many ways.) Has anyone experienced this? Am I in the twilight zone? Any known fixes? Thanks. Proof I'm not crazy in attached video. robotstxt_addon_domain.mp4
Technical SEO | | eglove0 -
Retaining Image Search Rankings After Migration
Hi There, I have a client with a very interesting dilemma out there. If you do an image search his images appear quite high in the rankings. However the way he achieved this isn't exactly within Google's guidelines. He is basically hiding the images within CSS. The reason behind this is that the pages have changed over the years and the images didn't fit in with the new existing text but he still wanted to maintain the high image search rankings. He is now changing to a brand new site and so this page he has been able to tweak successfully before, will no longer exist. He want's to know what is the best way to maintain his image search rankings. will a 301 redirect be enough? I know the morality issues of hiding images, but I want to know if he did what would be the best way to preserve his current image rankings. Kind Regards Neil
Technical SEO | | nezona0 -
How can I best handle parameters?
Thank you for your help in advance! I've read a ton of posts on this forum on this subject and while they've been super helpful I still don't feel entirely confident in what the right approach I should take it. Forgive my very obvious noob questions - I'm still learning! The problem: I am launching a site (coursereport.com) which will feature a directory of schools. The directory can be filtered by a handful of fields listed below. The URL for the schools directory will be coursereport.com/schools. The directory can be filtered by a number of fields listed here: Focus (ex: “Data Science”) Cost (ex: “$<5000”) City (ex: “Chicago”) State/Province (ex: “Illinois”) Country (ex: “Canada”) When a filter is applied to the directories page the CMS produces a new page with URLs like these: coursereport.com/schools?focus=datascience&cost=$<5000&city=chicago coursereport.com/schools?cost=$>5000&city=buffalo&state=newyork My questions: 1) Is the above parameter-based approach appropriate? I’ve seen other directory sites that take a different approach (below) that would transform my examples into more “normal” urls. coursereport.com/schools?focus=datascience&cost=$<5000&city=chicago VERSUS coursereport.com/schools/focus/datascience/cost/$<5000/city/chicago (no params at all) 2) Assuming I use either approach above isn't it likely that I will have duplicative content issues? Each filter does change on page content but there could be instance where 2 different URLs with different filters applied could produce identical content (ex: focus=datascience&city=chicago OR focus=datascience&state=illinois). Do I need to specify a canonical URL to solve for that case? I understand at a high level how rel=canonical works, but I am having a hard time wrapping my head around what versions of the filtered results ought to be specified as the preferred versions. For example, would I just take all of the /schools?focus=X combinations and call that the canonical version within any filtered page that contained other additional parameters like cost or city? Should I be changing page titles for the unique filtered URLs? I read through a few google resources to try to better understand the how to best configure url params via webmaster tools. Is my best bet just to follow the advice on the article below and define the rules for each parameter there and not worry about using rel=canonical ? https://support.google.com/webmasters/answer/1235687 An assortment of the other stuff I’ve read for reference: http://www.wordtracker.com/academy/seo-clean-urls http://www.practicalecommerce.com/articles/3857-SEO-When-Product-Facets-and-Filters-Fail http://www.searchenginejournal.com/five-steps-to-seo-friendly-site-url-structure/59813/ http://googlewebmastercentral.blogspot.com/2011/07/improved-handling-of-urls-with.html
Technical SEO | | alovallo0 -
Are my Canonical Links set up correctly?
I have Enable Canonical Links (recommended) on my web site. However, I also have THIS checked: Enable full URL for Home Page Canonical Link (include /default.asp) Is it hurting me??? Keep getting dinged on our report card. We are using the Volusion shopping cart software/platform.
Technical SEO | | GreenFarmParts0 -
Explain this search result
Hi folks, I came across a strange search result. Search on Google Australia for "income portfolio". http://www.google.com.au/search?sourceid=chrome&ie=UTF-8&q=income+portfolio See the first result? It's a login page. How is that search result showing? And in position #1! Where is it getting its title and descriptions tags from? Does Google have a way to somehow see what is behind the login? Appreciate your thought.
Technical SEO | | scotennis0 -
Should I set up a disallow in the robots.txt for catalog search results?
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
Technical SEO | | JordanJudson0 -
Www or no www in search results??
I am working with a client, and when I check on SERP placement, I never see the "www" in the SERP's only nameofcustomer.com not www.nameofcustomer.com. Of course there is a redirect going on...Question is...should this matter at all? I dont understand the relationship between this kind of redirect and SEO. Thank Mozzers
Technical SEO | | Giggy0