Need Help With Robots.txt on Magento eCommerce Site
-
Hello, I am having difficulty getting my robots.txt file to be configured properly. I am getting error emails from Google products stating they can't view our products because they are being blocked, and this past week, in my SEO dashboard, the URL's receiving search traffic dropped by almost 40%.
Is there anyone that can offer assistance on a good template robots.txt file I can use for a Magento eCommerce website?
The one I am currently using was found at this site here: e-commercewebdesign.co.uk/blog/magento-seo/magento-robots-txt-seo.php - However, I am getting problems from Google now because of it.
I searched and found this thread here: http://www.magentocommerce.com/wiki/multi-store_set_up/multiple_website_setup_with_different_document_roots#the_root_folder_robots.txt_file - But I felt like maybe I should get some additional help on properly configuring a robots for a Magento site.
Thanks in advance for any help. Please, let me know if you need more info to provide assistance.
-
You better back up your DB before doing that. Anyway, take a look at this MagentoConnect extension http://www.magentocommerce.com/magento-connect/MageWorx.com/extension/2852/seo-suite-enterprise#overview
or this one (it's by the same company
http://www.mageworx.com/seo-suite-pro-magento-extension.html
-
Thank you very much. We'll give that a shot and see how it goes. What started us tinkering with the robots file in the first place is that Bing Shopping told us it couldn't crawl our product images. Plus, our pdf files for product specs and manuals are all listed within the media folder. Do you have a suggestion for this? I would think we would get rid of "Disallow: /media/" and replace it with the following (what do you think?):
Disallow: /media/aitmanufacturers/
Disallow: /media/bigtom_media/
Disallow: /media/css/
Disallow: /media/downloadable/
Disallow: /media/easybanner/
Disallow: /media/geoip/
Disallow: /media/icons/
Disallow: /media/import/
Disallow: /media/js/
Disallow: /media/productsfeed/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/UPS/ -
Hello,
Below is what I use. You need to have the modrewrite enabled if you are going to disallow index.php and even then it's still very risky. This may be part of the issue. Robots.txt is so important, but you need to know what you are doing. Especially when disallowing as much as that UK site is.
Tyler
User-agent: *
Disallow: /*?
Disallow: /*.js$
Disallow: /*.css$
Disallow: /checkout/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /images/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /*.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /var/
Disallow: /customer/
Disallow: /enable-cookies/
Sitemap: http://domain.com/sitemap.xml
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved URL dynamic structure issue for new global site where I will redirect multiple well-working sites.
Dear all, We are working on a new platform called [https://www.piktalent.com](link url), were basically we aim to redirect many smaller sites we have with quite a lot of SEO traffic related to internships. Our previous sites are some like www.spain-internship.com, www.europe-internship.com and other similars we have (around 9). Our idea is to smoothly redirect a bit by a bit many of the sites to this new platform which is a custom made site in python and node, much more scalable and willing to develop app, etc etc etc...to become a bigger platform. For the new site, we decided to create 3 areas for the main content: piktalent.com/opportunities (all the vacancies) , piktalent.com/internships and piktalent.com/jobs so we can categorize the different types of pages and things we have and under opportunities we have all the vacancies. The problem comes with the site when we generate the diferent static landings and dynamic searches. We have static landing pages generated like www.piktalent.com/internships/madrid but dynamically it also generates www.piktalent.com/opportunities?search=madrid. Also, most of the searches will generate that type of urls, not following the structure of Domain name / type of vacancy/ city / name of the vacancy following the dynamic search structure. I have been thinking 2 potential solutions for this, either applying canonicals, or adding the suffix in webmasters as non index.... but... What do you think is the right approach for this? I am worried about potential duplicate content and conflicts between static content dynamic one. My CTO insists that the dynamic has to be like that but.... I am not 100% sure. Someone can provide input on this? Is there a way to block the dynamic urls generated? Someone with a similar experience? Regards,
Technical SEO | | Jose_jimenez0 -
One server, two domains - robots.txt allow for one domain but not other?
Hello, I would like to create a single server with two domains pointing to it. Ex: domain1.com -> myserver.com/ domain2.com -> myserver.com/subfolder. The goal is to create two separate sites on one server. I would like the second domain ( /subfolder) to be fully indexed / SEO friendly and have the robots txt file allow search bots to crawl. However, the first domain (server root) I would like to keep non-indexed, and the robots.txt file disallowing any bots / indexing. Does anyone have any suggestions for the best way to tackle this one? Thanks!
Technical SEO | | Dave1000 -
Why would this site outrank a Pr2 site with higher domain authority?
I am trying to get a pr2 site to be on top 7 local spot for the keyword Van Nuys Bail bonds but have discovered a site which has barely any back links and is not even a year old on top results. Their backlinks are from lower authority domains than what we have. How could this site be beating a 7 year old pr2 website? The site I'm working on is http://bbbail.com/ The site that is ranking in 5th spot local with pr0 is http://www.vipbailbonds.org/ is it maybe because it is a .org site? Also I notice that all websites in top spots have www, could that be a factor as well?
Technical SEO | | jesse13410 -
Do I need to do on-page SEO for my mobile site?
We have a desktop site, and we just built our first mobile site. Right now, the mobile site doesn't have any title tags, meta descriptions or anything like that, but do I need to even do that? If I have all of that on the desktop site, and the mobile site is just redirected from the desktop site, can't I just do it on the desktop site only? Is there anything to gain from doing it for both sites?
Technical SEO | | KempRugeLawGroup0 -
I need help to define which is the best friendly url structure
Hi, I need some help to define which is the best friendly url structure for my new project, I'm in doubt for some cases, anyone could help me define which would be the best way? domain.com/buy-online/0-1,this-cool-model or
Technical SEO | | LeonardoLima
domain.com/buy-online/this-cool-model,0-1 or
domain.com/buy-online/0-1/this-cool-model or
domain.com/buy-online/this-cool-model/0-1 or
domain.com/buy-online/this-cool-model_0-1 or
domain.com/buy-online/this-cool-model?Model=0&OtherParam=1 Thanks! Best Regards,
Leonardo Lima0 -
Well, I need some help, advice, something.
Hey all, I'm new to the SEOmoz thing but I like it so far. I think I have my site listing so messed up that it's effecting my rank. I have 3 domains. 1.) rt112media.com 2.) route112media.com 3.) route112.net. Each domain was purchased through GoDaddy.com and still remain there. I have my own hosting account which I was registered as rt112media.com with route112media.com and route112.net listed as add on domains. Technically, I would like for my main site to be route112media.com for everything. However when I registered the site as rt112media.com I didn't know the issues I would have as far as different domains so I registered with rt112media.com as my main domain name. Anyways, as of now I have rt112media.com as my main domain through my cpanel hosting.I have both domains route112media.com and route112.net set for 301 wildcard redirects to rt112media.com on my hosting account and my GoDaddy account. When I started my WMT account I didn't really know which domain to use cause I figured I could link them all to one. So, I signed up as routet12media.com. After a little while I realized it was not recieving anything because everything was being redirected to rt112media.com Anyways both addresses have been crawled and indexed so they are showing as two. So, I requested to change the route112media.com address to rt112media.com in WMT. That was about 2 weeks ago and it is still pending request. I'm not having further problems with WMT because of the www.rt112media.com vs http://rt112media.com. I am the verified owner of both but I can not switch the www.rt112media account to show the non www. account as the main one because I have the other pending. My site is still being crawled as 2 versions rt112media.com and route112media.com. So what is my best option? And what would be the worst cause scenario if I wanted to start completely over using route112media.com as my main domain with hosting and all. Sorry this was so long I just wanted to explain my situation. I'm lost. Any advice would be appreciated! http:/rt112media.com
Technical SEO | | Route112Media0 -
Does RogerBot read URL wildcards in robots.txt
I believe that the Google and Bing crawlbots understand wildcards for the "disallow" URL's in robots.txt - does Roger?
Technical SEO | | AspenFasteners0