Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Role of Robots.txt and Search Console parameters settings
-
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
-
Thank you! That helps a lot.
-
So, regarding NOINDEX vs. DISALLOW, there is a significant difference there.
If you disallow in robots, you are asking the search engine to not even crawl that page. Whereas if you NOINDEX in the page head, then the search engine may still crawl the page but should not index it.
There are a few impacts of this difference. For one, if you use NOINDEX but still allow the search engine to FOLLOW, then it may discover pages which otherwise might not have been discovered (if that page has unique links, for example). So in this case, you might prefer to use (NOINDEX, FOLLOW) if you want that discovery to happen. On the other hand, if you have many pages and you are trying to wisely use the search engine's crawl "budget", then you might in some cases prefer to disallow some paths in the robots.txt file.
It's also common to use robots.txt to disallow some files where you do not have control over the response. Non-html files, where you might not be able to easily administer noindex directives. Or dynamic pages your web application may serve but not allow you to administer head tags for.
All of that said, robots.txt files have been shrinking ever since the search engines began to render javascript, since now they need access to a lot of resource files which they previously did not. Much of the old advice of disallowing scripts and admin folder paths may be obsolete now, if those files are needed to properly render pages.
-
Thanks so much for the reply. I am still struggling to understand when it's best to use robots.txt
I think I understand that url parameters are best handled in the search console parameters tool, and if you want to keep a page out of the index, it's best to use meta noindex rather than blocking it in robots.txt
What would be an example of when you would want to disallow something in robots.txt?
-
For one, the GSC functionality is much easier to use for dealing with URLs having multiple query string parameters. robots.txt processes the statements in order, so you often have to set up a broad disallow, followed by more specific allows, to achieve the same result which can be more easily managed in GSC.
Also, GSC is useful for the "representative URL" setting, if your pages don't necessarily get crawled without the parameter present at all, but you only want one version of the page indexed if the crawler encounters multiple versions. So, this is a little like a dynamic canonical, except you are not specifying which version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate title while setting canonical tag.
Hi Moz Fan, My websites - https://finance.rabbit.co.th/ has run financial service, So our main keywords is about "Insurance" in Thai, But today I have an issues regarding to carnonical tag. We have a link that containing by https://finance.rabbit.co.th/car-insurance?showForm=1&brand_id=9&model_id=18&car_submodel_id=30&ci_source_id=rabbit.co.th&car_year=2014 and setting canonical to this url - https://finance.rabbit.co.th/car-insurance within 5,000 items. But in this case I have an warning by site audit tools as Duplicate Page Title (Canonical), So is that possible to drop our ranking. What should we do, setting No-Index, No-Follow for all URL that begin with ? or keep them like that.
Technical SEO | | ASKHANUMANTHAILAND0 -
Why has my search traffic suddenly tanked?
On 6 June, Google search traffic to my Wordpress travel blog http://www.travelnasia.com tanked completely. There are no warnings or indicators in Webmaster Tools that suggest why this happened. Traffic from search has remained at zero since 6 June and shows no sign of recovering. Two things happened on or around 6 June. (1) I dropped my premium theme which was proving to be not mobile friendly and replaced it with the ColorMag theme which is responsive. (2) I relocated off my previous hosting service which was showing long server lag times to a faster host. Both of these should have improved my search performance, not tanked it. There were some problems with the relocation to the new web host which resulted in a lot of "out of memory" errors on the website for 3-4 days. The allowed memory was simply not enough for the complexity of the site and the volume of traffic. After a few days of trying to resolve these problems, I moved the site to another web host which allows more PHP memory and the site now appears reliably accessible for both desktop and mobile. But my search traffic has not recovered. I am wondering if in all of this I've done something that Google considers to be a cardinal sin and I can't see it. The clues I'm seeing include: Moz Pro was unable to crawl my site last Friday. It seems like every URL it tried to crawl was of the form http://www.travelnasia.com/wp-login.php?action=jetpack-sso&redirect_to=http://www.travelnasia.com/blog/bangkok-skytrain-bts-mrt-lines which resulted in a 500 status error. I don't know why this happened but I have disabled the Jetpack login function completely, just in case it's the problem. GWT tells me that some of my resource files are not accessible by GoogleBot due to my robots.txt file denying access to /wp-content/plugins/. I have removed this restriction after reading the latest advice from Yoast but I still can't get GWT to fetch and render my posts without some resource errors. On 6 June I see in Structured Data of GWT that "items" went from 319 to 1478 and "items with errors" went from 5 to 214. There seems to be a problem with both hatom and hcard microformats but when I look at the source code they seem to be OK. What I can see in GWT is that each hcard has a node called "n [n]" which is empty and Google is generating a warning about this. I see that this is because the author vcard URL class now says "url fn n" but I don't see why it says this or how to fix it. I also don't see that this would cause my search traffic to tank completely. I wonder if anyone can see something I'm missing on the site. Why would Google completely deny search traffic to my site all of a sudden without notifying any kind of penalty? Note that I have NOT changed the content of the site in any significant way. And even if I did, it's unlikely to result in a complete denial of traffic without some kind of warning.
Technical SEO | | Gavin.Atkinson1 -
Parked domain is first in search results
We have several brand related domains which are parked and pointing to our main website. Some of these websites are redirecting using a 302 (don't ask, that's a whole other story), but these are being changed. But it shouldn't matter what type of redirect they are no? Since there has never been any traffic and they are not indexed? But it seems that one of them was indexed: exotravel.vn. A search for our brand name or the previous brand name (exotravel and exotissimo) brings up this parked domain first! How can that be? The domain has never been used and has no backlinks. exotravel.vn is redirecting and I submitted a change of address weeks ago to Google, but its still coming up first in all brand name searches for exotissimo or exotravel.
Technical SEO | | Exotissimo0 -
Links under Meta Description when performing a search
Doing research for clients, I have came across seeing sites displaying hyperlinks underneath their own meta description. keywords that I have googled that result with hyperlinks displaying under meta descriptions: Google'd: iacquire (brand) bmw wheels (Beyern Wheels, position 1) aftermarket bmw wheels (MMR Wheels, position 2) These companys have hyperlinks underneath their descriptions. Anyone have any ideas why this happens or how it happens?
Technical SEO | | frnprz0 -
Mobile URL parameter (Redirection to desktop)
Hello, We have a parallel mobile website and recently we implemented a link pointing to the desktop website. This redirect is happening via a javascript code and results in a url followed by this paramenter: ?m=off Example:
Technical SEO | | echo1
http://www.m.website.com redirects to:
http://www.website.com/?m=off Questions: Will the "http://www.website.com/?m=off" be considered duplicate content with "http://www.website.com" since they both return the same content? Is there any possibility that Google will take into consideration the url ending in "/?m=off"? How should we treat this new url? The webmaster tools URL parameter configuration at the moment isn't experiencing problems but should we submit the parameter anyway in order not to be indexed or should we wait first and see the error response? In case we should submit this for removal... what's the best way to do it? Like this? Parameter: ?m=off Does this parameter change page content seen by the user? - doesn't affect page content Any help is much appreciated.
Thank you!0 -
Does Bing ignore robots txt files?
Bonjour from "Its a miracle is not raining" Wetherby Uk 🙂 Ok here goes... Why despite a robots text file excluding indexing to site http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google? Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below. http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg Any insights welcome 🙂
Technical SEO | | Nightwing0 -
NoIndex/NoFollow pages showing up when doing a Google search using "Site:" parameter
We recently launched a beta version of our new website in a subdomain of our existing site. The existing site is www.fonts.com with the beta living at new.fonts.com. We do not want Google to crawl the new site until it's out of beta so we have added the following on all pages: However, one of our team members noticed that google is displaying results from new.fonts.com when doing an "site:new.fonts.com" search (see attached screenshot). Is it possible that Google is indexing the content despite the noindex, nofollow tags? We have double checked the syntax and it seems correct except the trailing "/". I know Google still crawls noindexed pages, however, the fact that they're showing up in search results using the site search syntax is unsettling. Any thoughts would be appreciated! DyWRP.png
Technical SEO | | ChrisRoberts-MTI0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0