Robots file set up
-
The robots file looks like it has been set up in a very messy way.
I understand the # will comment out a line, does this mean the sitemap would
not be picked up?
Disallow: /js/ should this be allowed like /*.js$
Disallow: /media/wysiwyg/ - this seems to be causing alerts in webmaster tools as it can not access
the images within.
Can anyone help me clean this up please
#Sitemap: https://examplesite.com/sitemap.xml
Crawlers Setup
User-agent: *
Crawl-delay: 10Allowable Index
Mind that Allow is not an official standard
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Allow: /catalogsearch/result/
Allow: /media/catalog/
Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/Disallow: /media/
Disallow: /media/captcha/
Disallow: /media/catalog/
#Disallow: /media/css/
#Disallow: /media/css_secure/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
#Disallow: /media/js/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: */catalog/product/upload/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /get.php # Magento 1.5+Paths (no clean URLs)
#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=
Disallow: /rss*
Disallow: /*PHPSESSIDDisallow: /:
Disallow: /User-agent: Fatbot
Disallow: /User-agent: TwengaBot-2.0
Disallow: / -
To add to this, I'd also recommend having a look around in /lib/ just to make sure you aren't blocking important javascript and css files (I've been bitten by this!).
More guidance here: https://developers.google.com/webmasters/mobile-sites/mobile-seo/common-mistakes/blocked-resources?hl=en
-
Looks like your intuitions are pretty good! I would remove the # before sitemap, as you have indicated. I would remove the line about /js/ as Google needs access to javascript these days and will throw a fit if you don't. I wouldnt worry about the wysiwyg directory if it only has images that you dont care about ranking.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Where to put 301 redirects in my Wordpress htaccess file?
I have about 25 301 redirects in my Wordpress htaccess file, that look like this: <code>Redirect301/store/index.html https://www.notesinspanish.com/store-home/</code> At the moment they are at the bottom of my htaccess file, below the usual Wordpress rewrite rules: <code># BEGIN WordPress <ifmodulemod_rewrite.c>RewriteEngine On RewriteBase / RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] # END WordPress</ifmodulemod_rewrite.c></code> So they are below all that. Above my WP rewrite rules I have a number of other rules from plugins (caching, ssl). Are my 301's OK where they are at the very bottom of that file? They are working, and redircting pages correctly. Should they be somewhere else? Many thanks for any help. Thanks for any help.
Technical SEO | | Benspain0 -
Little confused regarding robots.txt
Hi there Mozzers! As a newbie, I have a question that what could happen if I write my robots.txt file like this... User-agent: * Allow: / Disallow: /abc-1/ Disallow: /bcd/ Disallow: /agd1/ User-agent: * Disallow: / Hope to hear from you...
Technical SEO | | DenorL0 -
Robots.txt Download vs Cache
We made an update to the Robots.txt file this morning after the initial download of the robots.txt file. I then submitted the page through Fetch as Google bot to get the changes in asap. The cache time stamp on the page now shows Sep 27, 2013 15:35:28 GMT. I believe that would put the cache time stamp at about 6 hours ago. However the Blocked URLs tab in Google WMT shows the robots.txt last downloaded at 14 hours ago - and therefore it's showing the old file. This leads me to believe for the Robots.txt the cache date and the download time are independent. Is there anyway to get Google to recognize the new file other than waiting this out??
Technical SEO | | Rich_A0 -
New domain's Sitemap.xml file loaded to old domain - how does this effect SEO?
I have a client who recently changed their domain when they redesigned their site. The client wanted the old site to remain live for existing customers with links to the new domain. I guess as a workaround, the developer loaded the new domain's sitemap.xml file to the old domain. What SEO ramifications would this have if any on the primary (new) domain?
Technical SEO | | julesae0 -
Blocked by meta-robots but there is no robots file
OK, I'm a little frustred here. I've waited a week for the next weekly index to take place after changing the privacy setting in a wordpress website so Google can index, but I still got the same problem. Blocked by meta-robots, no index, no follow. But I do not see a robot file anywhere and the privacy setting in this Wordpress site is set to allow search engines to index this site. Website is www.marketalert.ca What am I missing here? Why can't I index the rest of the website and is there a faster way to test this rather than wait another week just to find out it didn't work again?
Technical SEO | | Twinbytes0 -
Way to find how many sites within a given set link to a specific site?
Hi, Does anyone have an idea on how to determine how many sites within a list of 50 sites link to a specific site? Thanks!
Technical SEO | | SparkplugDigital0 -
High pr doc files
I saw that the website www.comunicatedepresa.net outranks www.comunicatedepresa.ro for the therm "comunicate de presa" in google.ro SERP even though .ro beats .net in every seo indicator (links, domains linking, fb likes, g+, onpage etc) I saw that site:www.comunicatedepresa.net returns a lot of *.doc files with a title that contains the kw ("comunicate de presa"). Ex: www.comunicatedepresa.net/worddoc/1485/ It seems a little suspicios to me.Did anyone see this before (google giving higher importance to doc files)? Does anyone know why .net site is ranking better?
Technical SEO | | seo.academy0 -
Submitting Sitemap File vs Sitemap Index File
Is it better to submit all sitemap files contained in a Sitemap Index File manually to Google or is it about the same as just submitting the Master Sitemap Index File.
Technical SEO | | AU-SEO0