Google robots.txt test - not picking up syntax errors?
-
I just ran a robots.txt file through "Google robots.txt Tester" as there was some unusual syntax in the file that didn't make any sense to me...
e.g. /url/?*
/url/?
/url/*and so on. I would use ? and not ? for example and what is ? for! - etc.
Yet "Google robots.txt Tester" did not highlight the issues...
I then fed the sitemap through http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php and that tool actually picked up my concerns.
Can anybody explain why Google didn't - or perhaps it isn't supposed to pick up such errors?
Thanks, Luke
-
Many thanks Beau - much appreciated.
-
Hey Luke,
It appears that in each of the three examples, there was a plausible case for each example. Let's cover each:
- For /url/?* , it can be expressed that a URL can offer a trailing slash and then a query string, see examples here.
- with /url/? , this covers examples of the above and in addition, would plausibly block product pages that generate query strings, similar to this example from H&M. In essence, only allowing the product page to be seen.
- /url/* , well, that's just anything and everything after the trailing slash.
I guess the question you should ask yourself is "Is this the best approach for the issue?"
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google related searches
Hello, Are the related searches, the words that I should use when writing my content. For ex : when I type online spreadsheet in google, in the related searches it list online spreadsheet open source and spreasheet download. Does it means that when writing content I should included those terms in order to be relevant on the keyword online spreadsheet ? because they are considered closely related by google ?
Intermediate & Advanced SEO | | seoanalytics0 -
Robots.txt, Disallow & Indexed-Pages..
Hi guys, hope you're well. I have a problem with my new website. I have 3 pages with the same content: http://example.examples.com/brand/brand1 (good page) http://example.examples.com/brand/brand1?show=false http://example.examples.com/brand/brand1?show=true The good page has rel=canonical & it is the only page should be appear in Search results but Google has indexed 3 pages... I don't know how should do now, but, i am thinking 2 posibilites: Remove filters (true, false) and leave only the good page and show 404 page for others pages. Update robots.txt with disallow for these parameters & remove those URL's manually Thank you so much!
Intermediate & Advanced SEO | | thekiller990 -
Robots.txt for Facet Results
Hi Does anyone know how to properly add facets URL's to Robots txt? E.g. of our facets URL - http://www.key.co.uk/en/key/platform-trolleys-trucks#facet:-10028265807368&productBeginIndex:0&orderBy:5&pageView:list& Everything after the # will need to be blocked on all pages with a facet. Thank you
Intermediate & Advanced SEO | | BeckyKey0 -
Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?
my site uses is set up at http://www.site.com I have my site redirected from non- www to the www in htacess file. My question is... what should my robots.txt file look like for the non-www site? Do you block robots from crawling the site like this? Or do you leave it blank? User-agent: * Disallow: / Sitemap: http://www.morganlindsayphotography.com/sitemap.xml Sitemap: http://www.morganlindsayphotography.com/video-sitemap.xml
Intermediate & Advanced SEO | | morg454540 -
Google Manual Action Disappear
Hi Guys, I have a website that received unnatural link message. We had started link removal process, disavowed not approachable links and file reconsideration four times but all the times Google sent some samples of unnatural links and rejected our reconsideration. Last week we again disavowed non approachable links and planned to file reconsideration after a week but today when i tried to file reconsideration, Manual action message was disappeared. We haven't received any message from Google. The same case was happened with one more of our site earlier. Manual action disappeared means it has been revoked or something else??
Intermediate & Advanced SEO | | RuchiPardal0 -
Is there any delay between crawling a page by google and displaying of the ratings in rich snippet of the results in google?
Is there any delay between crawling a page by google and displaying of the ratings in rich snippet of the results in google?
Intermediate & Advanced SEO | | NEWCRAFT0 -
Should we block urls like this - domainname/shop/leather-chairs.html?brand=244&cat=16&dir=ascℴ=price&price=1 within the robots.txt?
I've recently added a campaign within the SEOmoz interface and received an alarming number of errors ~9,000 on our eCommerce website. This site was built in Magento, and we are using search friendly url's however most of our errors were duplicate content / titles due to url's like: domainname/shop/leather-chairs.html?brand=244&cat=16&dir=asc&order=price&price=1 and domainname/shop/leather-chairs.html?brand=244&cat=16&dir=asc&order=price&price=4. Is this hurting us in the search engines? Is rogerbot too good? What can we do to cut off bots after the ".html?" ? Any help would be much appreciated 🙂
Intermediate & Advanced SEO | | MonsterWeb280 -
Google.ca vs Google.com Ranking
I have a site I would like to rank high for particular keywords in the Google.ca searches and don't particularly care about the Google.com searches (it's a Canadian service). I have logged into Google Webmaster Tools and targeted Canada. Currently my site is ranking on the third page for my desired keywords on Google.com, but is on the 20th page for Google.ca. Previously this change happened quite quickly -- within 4 weeks -- but it doesn't seem to be taking here (12 weeks out and counting). My optimization seems to be fine since I'm ranking well on Google.com: not sure why it's not translating to Google.ca. Any help or thoughts would be appreciated.
Intermediate & Advanced SEO | | seorm0