Robots file set up
-
The robots file looks like it has been set up in a very messy way.
I understand the # will comment out a line, does this mean the sitemap would
not be picked up?
Disallow: /js/ should this be allowed like /*.js$
Disallow: /media/wysiwyg/ - this seems to be causing alerts in webmaster tools as it can not access
the images within.
Can anyone help me clean this up please
#Sitemap: https://examplesite.com/sitemap.xml
Crawlers Setup
User-agent: *
Crawl-delay: 10Allowable Index
Mind that Allow is not an official standard
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/Allow: /catalogsearch/result/
Allow: /media/catalog/
Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/Disallow: /media/
Disallow: /media/captcha/
Disallow: /media/catalog/
#Disallow: /media/css/
#Disallow: /media/css_secure/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
#Disallow: /media/js/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: */catalog/product/upload/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /get.php # Magento 1.5+Paths (no clean URLs)
#Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=
Disallow: /rss*
Disallow: /*PHPSESSIDDisallow: /:
Disallow: /User-agent: Fatbot
Disallow: /User-agent: TwengaBot-2.0
Disallow: / -
To add to this, I'd also recommend having a look around in /lib/ just to make sure you aren't blocking important javascript and css files (I've been bitten by this!).
More guidance here: https://developers.google.com/webmasters/mobile-sites/mobile-seo/common-mistakes/blocked-resources?hl=en
-
Looks like your intuitions are pretty good! I would remove the # before sitemap, as you have indicated. I would remove the line about /js/ as Google needs access to javascript these days and will throw a fit if you don't. I wouldnt worry about the wysiwyg directory if it only has images that you dont care about ranking.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do old website files in the public_html effect SEO?
My client has about a dozen old folders filled with old websites including index files, robots, htaccess files. They are all located in separate files with in public_html. Does this effect them negatively?
Technical SEO | | Renalynd0 -
Do robot.txts permanently affect websites even after they have been removed?
A client has a Wordpress blog to sit alongside their company website. They kept it hidden whilst they were developing what it looked like, keeping it un-searchable by Search Engines. It was still live, but Wordpress put a robots.txt in place. When they were ready they removed the robots.txt by clicking the "allow Search Engines to crawl this site" button. It took a month and a half for their blog to show in Search Engines once the robot.txt was removed. Google is now recognising the site (as a "site:" test has shown) however, it doesn't rank well for anything. This is despite the fact they are targeting keywords with very little organic competition. My question is - could the fact that they developed the site behind a robot.txt (rather than offline) mean the site is permanently affected by the robot.txt in the eyes of the Search Engines, even after that robot.txt has been removed? Thanks in advance for any light you can shed on the situation.
Technical SEO | | Driver720 -
Robots.txt
Hello, My client has a robots.txt file which says this: User-agent: * Crawl-delay: 2 I put it through a robots checker which said that it must have a **disallow command**. So should it say this: User-agent: * Disallow: crawl-delay: 2 What effect (if any) would not having a disallow command make? Thanks
Technical SEO | | AL123al0 -
Disavow file and backlinks listed in webmaster tools
Hi guys, I've sent a disavow file via webmaster tools. After that, should the backlinks from domains listed in that file disappear from the list of links to my website in webmaster tools? Or does webmaster tools show all the links, whether I've sent disavow file or not?
Technical SEO | | superseopl0 -
How to set Home page for the best effect
My head is spinning with all the confusing possibilities. Does anybody have an easy answer for setting up the home page and its canonical-ishness ie Which gives the best SEO Mojo ? \ \default.aspx \keyword\ \keyword\default.aspx Thanking you in advance for reducing the number of business migranes around the globe.
Technical SEO | | blinkybill0 -
Need Help writing 301 redirects in .htaccess file
SEOmoz tool shows me 2 errors for duplicate content pages (www.abc.com and www.abc.com/index.html). I believe, the solution to this is writing 301 redirects I need two 301 redirects 1. abc.com to www.abc.com 2. /index.html to / (which is www.abc.com/index.html to www.abc.com) The code that I currently have is ................................................... RewriteEngine On
Technical SEO | | WebsiteEditor
RewriteCond %{HTTP_HOST} ^abc.com
RewriteRule (.*) http://www.abc.com/$1 [R=301,L] Redirect 301 http://www.abc.com/index.html http://www.abc.com ...................................................... but this does not redirect /index.html to abc.com. What is wrong here? Please help.0 -
Set base-href to subfolders - problems?
A customer is using the <base>-tag in an odd way: <base href="http://domain.com/1.0.0/1/1/"> My own theory is that the subfolders are added as the root because of revision control. CSS, images and internal links are used like this:
Technical SEO | | Vivamedia
internal link I ran a test with Xenu Link Sleuth and found many broken links on the site, but I can't say if it is due to the base-tag. I have read that the base-tag may cause problems in some browsers, but is this usage of base-tag bad in some SEO-perspective? I have a lot of problems with this customer and I want to know if the base-tag is a part of it.0 -
Robots.txt question
Hello, What does the following command mean - User-agent: * Allow: / Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ? Thanks
Technical SEO | | seoug_20050