Robots.txt question
-
Hello,
What does the following command mean -
User-agent: * Allow: /
Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ?
Thanks
-
It's a good idea to have an xml site map and make sure the search engines know where it is. It's part of the protocol that they will look in the robots.txt file for the location for your sitemap.
-
I was assuming that by including / after allow, we are blocking the spiders and also thought that allow is not supported by search engines.
Thanks for clarifications. A better approach would be
User-Agent: * Allow:
right ?
The best one of course is
**User-agent: * Disallow:**
-
That's not really necessary unless there URLs or directories you're disallowing after the allow in your robots.txt. Allow is a directive supported by major search engines, but search engines assume they're allowed to crawl everything they find unless you disallow it specifically in your robots.txt.
The following is universally accepted by bots and essentially means the same thing as what I think you're trying to say, allowing bots to crawl everything:
User-agent: * Disallow:
There's a sample use of the Allow directive on the wikipedia robots.txt page here.
-
There's more information about robots.txt from SEOmoz at http://www.seomoz.org/learn-seo/robotstxt
SEOmoz and the robots.txt site suggest the following for allowing robots to see everying and list your sitemap:
User-agent: *
Disallow:Sitemap: http://www.example.com/none-standard-location/sitemap.xml
-
Any particular reason for doing so ?
-
That robots txt should be fine.
But you should also add your XML sitemap to the robots.txt file, example:
User-Agent: * Allow: / Sitemap: http://www.website.com/sitemap.xml
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Subdomain Ranking Question
Hi All - Quick question that I think I know the answer to, but I feel like I've been going around in circles a bit. My client is launching a new product and wants us to build a microsite for it (product.clientname.com). My client really dislikes their brand website, and wants to use paid media to push their audience to this new microsite. However, they also said want it to rank well organically. I feel uneasy about this, because of the subdomain vs. subfolder argument. I believe that the product will also be listed/featured on their main brand website. What is the best way forward? Thanks!
Technical SEO | | AinsleyAgency0 -
301 redirect homepage question
Hi If i have a homepage which is available at both www.homepage.com and www.homepage.com// should i 301 the // version to the first version. Im curious as to whether slashes are taking into consideration Thanks in advance
Technical SEO | | TheZenAgency0 -
Question about duplicate images used within a single site
I understand that using duplicate images across many websites was become an increasingly important duplicate content issue to be aware of. We have a couple dozen geotargeted landing pages on our site that are designed to promote our services to residents from various locations in our area. We've created 400+ word pieces of fresh, original content for each page, some of which talks about the specific region in some detail. However, we have a powerful list of top reasons to choose us that we'd like to use on each page as is, without rewriting them for each page. We'd like to simply present this bulleted list as an image file on each page to get around any duplicate written copy concerns. This image would not appear on any other websites but would appear on about two dozen landing pages for a single site. Is there anything to worry about this strategy from a duplicate content or duplicate image perspective in terms of SEO?
Technical SEO | | LeeAbrahamson0 -
Rel canonical question
Hi, I have an e-commerce site hosted on Volusion currently the rel canonical link for the homepage points to www.store.com/default.asp. I spoke with the Volusion support people and they told me that whether the canonical link points to store.com/default.asp or store.com does not really matter as long as there is a canonical version. I thought this sounded odd, so looked at other websites hosted on volusion and some sites canonicalize to default.asp and others .com. (volusion.com canonicalizes to .com fwiw). The question is...I have a majority of my external links going to www.store.com , and since that page has default.asp as it canonical version, am I losing link juice from those incoming links? If so, should I change the canonical link? If I do what are the potential issues/penalties? Hopefully this question makes sense and thanks in advance.
Technical SEO | | IOSC0 -
User Reviews Question
On my e-commerce site, I have user reviews that cycle in the header section of my category pages. They appear/cycle via a snippet of code that the review program provided me with. My question is...b/c the actual user-generated content is not in the page content does the google-bot not see this content? Does it not treat the page as having fresh content even though the reviews are new? Does the bot only see the code that provides the reviews? Thanks in advance. Hopefully this question is clear enough.
Technical SEO | | IOSC0 -
Robots.txt - What is the correct syntax?
Hello everyone I have the following link: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I want to prevent google from indiexing everything that is related to "view=send_friend" The problem is that its giving me dublicate content, and the content of the links has no SEO value of any sort. My problem is how i disallow it correctly via robots.txt I tried this syntax: Disallow: /view=send_friend/ However after doing a crawl on request the 200+ dublicate links that contains view=send_friend is still present in the CSV crawl report. What is the correct syntax if i want to prevent google from indexing everything that is related to this kind of link?
Technical SEO | | teleman0 -
An Easy Question - Backlinks
Hi guys, I know this is an easy question and I'm already quite sure of the answer for it but it would be good to get some other views. This website - http://www.collapso.net/ have 261,923 backlinks to our website according to Ahrefs. They have 1000's of pages like this - http://www.collapso.net/countiesnew/Cork.html which link to our site. 43.95% of the backlinks to our site are from these guys but we've been fortunate enough to never receive any warnings via WMT or ever experienced drop offs in traffic. My question is - Do we have this site remove all the links to our site or leave them alone? Given there's such a large quantity of links, I'm not exactly sure what the impact would be on us. My instinct says get rid of them. Although part of me questions what such a massive drop in our link profile would look like to Google.
Technical SEO | | MarkScully0 -
Robots.txt not working?
Hello This is my robots.txt file http://www.theprinterdepo.com/Robots.txt However I have 8000 warnings on my dashboard like this:4 What am I missing on the file¿ Crawl Diagnostics Report On-Page Properties <dl> <dt>Title</dt> <dd>Not present/empty</dd> <dt>Meta Description</dt> <dd>Not present/empty</dd> <dt>Meta Robots</dt> <dd>Not present/empty</dd> <dt>Meta Refresh</dt> <dd>Not present/empty</dd> </dl> URL: http://www.theprinterdepo.com/catalog/product_compare/add/product/100/uenc/aHR0cDovL3d3dy50aGVwcmludGVyZGVwby5jb20vaHAtbWFpbnRlbmFjZS1raXQtZm9yLTQtbGo0LWxqNS1mb3ItZXhjaGFuZ2UtcmVmdWJpc2hlZA,,/ 0 Errors No errors found! 1 Warning 302 (Temporary Redirect) Found about 5 hours ago <a class="more">Read More</a>
Technical SEO | | levalencia10