Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Role of Robots.txt and Search Console parameters settings
-
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
-
Thank you! That helps a lot.
-
So, regarding NOINDEX vs. DISALLOW, there is a significant difference there.
If you disallow in robots, you are asking the search engine to not even crawl that page. Whereas if you NOINDEX in the page head, then the search engine may still crawl the page but should not index it.
There are a few impacts of this difference. For one, if you use NOINDEX but still allow the search engine to FOLLOW, then it may discover pages which otherwise might not have been discovered (if that page has unique links, for example). So in this case, you might prefer to use (NOINDEX, FOLLOW) if you want that discovery to happen. On the other hand, if you have many pages and you are trying to wisely use the search engine's crawl "budget", then you might in some cases prefer to disallow some paths in the robots.txt file.
It's also common to use robots.txt to disallow some files where you do not have control over the response. Non-html files, where you might not be able to easily administer noindex directives. Or dynamic pages your web application may serve but not allow you to administer head tags for.
All of that said, robots.txt files have been shrinking ever since the search engines began to render javascript, since now they need access to a lot of resource files which they previously did not. Much of the old advice of disallowing scripts and admin folder paths may be obsolete now, if those files are needed to properly render pages.
-
Thanks so much for the reply. I am still struggling to understand when it's best to use robots.txt
I think I understand that url parameters are best handled in the search console parameters tool, and if you want to keep a page out of the index, it's best to use meta noindex rather than blocking it in robots.txt
What would be an example of when you would want to disallow something in robots.txt?
-
For one, the GSC functionality is much easier to use for dealing with URLs having multiple query string parameters. robots.txt processes the statements in order, so you often have to set up a broad disallow, followed by more specific allows, to achieve the same result which can be more easily managed in GSC.
Also, GSC is useful for the "representative URL" setting, if your pages don't necessarily get crawled without the parameter present at all, but you only want one version of the page indexed if the crawler encounters multiple versions. So, this is a little like a dynamic canonical, except you are not specifying which version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Tester - syntax not understood
I've looked in the robots.txt Tester and I can see 3 warnings: There is a 'syntax not understood' warning for each of these. XML Sitemaps:
Technical SEO | | JamesHancocks1
https://www.pkeducation.co.uk/post-sitemap.xml
https://www.pkeducation.co.uk/sitemap_index.xml How do I fix or reformat these to remove the warnings? Many thanks in advance.
Jim0 -
Discrepancy in actual indexed pages vs search console
Hi support, I checked my search console. It said that 8344 pages from www.printcious.com/au/sitemap.xml are indexed by google. however, if i search for site:www.printcious.com/au it only returned me 79 results. See http://imgur.com/a/FUOY2 https://www.google.com/search?num=100&safe=off&biw=1366&bih=638&q=site%3Awww.printcious.com%2Fau&oq=site%3Awww.printcious.com%2Fau&gs_l=serp.3...109843.110225.0.110430.4.4.0.0.0.0.102.275.1j2.3.0....0...1c.1.64.serp..1.0.0.htlbSGrS8p8 Could you please advise why there is discrepancy? Thanks.
Technical SEO | | Printcious0 -
Does an Apostrophe affect searches?
Does Google differentiate between keyphrase structures such as Mens Sunglasses & Men**'**s Sunglasses? I.e. does the inclusion/exclusion of an apostrophe make any difference when optimising your main keyword/phrase for a page? Keyword explorer appears to give different results..... I.e. no data for Men's Sunglasses, but data appears for Mens sunglasses. So if I optimise my page to include the apostrophe, will it screw the potential success for that page? Thanks 🙂 Bob
Technical SEO | | SushiUK1 -
Why has my search traffic suddenly tanked?
On 6 June, Google search traffic to my Wordpress travel blog http://www.travelnasia.com tanked completely. There are no warnings or indicators in Webmaster Tools that suggest why this happened. Traffic from search has remained at zero since 6 June and shows no sign of recovering. Two things happened on or around 6 June. (1) I dropped my premium theme which was proving to be not mobile friendly and replaced it with the ColorMag theme which is responsive. (2) I relocated off my previous hosting service which was showing long server lag times to a faster host. Both of these should have improved my search performance, not tanked it. There were some problems with the relocation to the new web host which resulted in a lot of "out of memory" errors on the website for 3-4 days. The allowed memory was simply not enough for the complexity of the site and the volume of traffic. After a few days of trying to resolve these problems, I moved the site to another web host which allows more PHP memory and the site now appears reliably accessible for both desktop and mobile. But my search traffic has not recovered. I am wondering if in all of this I've done something that Google considers to be a cardinal sin and I can't see it. The clues I'm seeing include: Moz Pro was unable to crawl my site last Friday. It seems like every URL it tried to crawl was of the form http://www.travelnasia.com/wp-login.php?action=jetpack-sso&redirect_to=http://www.travelnasia.com/blog/bangkok-skytrain-bts-mrt-lines which resulted in a 500 status error. I don't know why this happened but I have disabled the Jetpack login function completely, just in case it's the problem. GWT tells me that some of my resource files are not accessible by GoogleBot due to my robots.txt file denying access to /wp-content/plugins/. I have removed this restriction after reading the latest advice from Yoast but I still can't get GWT to fetch and render my posts without some resource errors. On 6 June I see in Structured Data of GWT that "items" went from 319 to 1478 and "items with errors" went from 5 to 214. There seems to be a problem with both hatom and hcard microformats but when I look at the source code they seem to be OK. What I can see in GWT is that each hcard has a node called "n [n]" which is empty and Google is generating a warning about this. I see that this is because the author vcard URL class now says "url fn n" but I don't see why it says this or how to fix it. I also don't see that this would cause my search traffic to tank completely. I wonder if anyone can see something I'm missing on the site. Why would Google completely deny search traffic to my site all of a sudden without notifying any kind of penalty? Note that I have NOT changed the content of the site in any significant way. And even if I did, it's unlikely to result in a complete denial of traffic without some kind of warning.
Technical SEO | | Gavin.Atkinson1 -
Parked domain is first in search results
We have several brand related domains which are parked and pointing to our main website. Some of these websites are redirecting using a 302 (don't ask, that's a whole other story), but these are being changed. But it shouldn't matter what type of redirect they are no? Since there has never been any traffic and they are not indexed? But it seems that one of them was indexed: exotravel.vn. A search for our brand name or the previous brand name (exotravel and exotissimo) brings up this parked domain first! How can that be? The domain has never been used and has no backlinks. exotravel.vn is redirecting and I submitted a change of address weeks ago to Google, but its still coming up first in all brand name searches for exotissimo or exotravel.
Technical SEO | | Exotissimo0 -
Blocking Affiliate Links via robots.txt
Hi, I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code: User-agent: * Disallow: / We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark
Technical SEO | | Mark_Ginsberg0 -
Links under Meta Description when performing a search
Doing research for clients, I have came across seeing sites displaying hyperlinks underneath their own meta description. keywords that I have googled that result with hyperlinks displaying under meta descriptions: Google'd: iacquire (brand) bmw wheels (Beyern Wheels, position 1) aftermarket bmw wheels (MMR Wheels, position 2) These companys have hyperlinks underneath their descriptions. Anyone have any ideas why this happens or how it happens?
Technical SEO | | frnprz0 -
Should I nofollow search results pages
I have a customer site where you can search for products they sell url format is: domainname/search/keywords/ keywords being what the user has searched for. This means the number of pages can be limitless as the client has over 7500 products. or should I simply rel canonical the search page or simply no follow it?
Technical SEO | | spiralsites0