Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What do you add to your robots.txt on your ecommerce sites?
-
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following:
- Checkout
- Basket
Then possibly:
- Price
- Theme
- Sortby
- other misc filters.
What do you include?
-
I'm on this same path since we too cannot use noindex / nofollow due to limited backend interaction with Bigcommerce.
I like to block all cart related pages, which for ecommerce sites can be a boat load.
- /cart.php
- /checkout.php
- /finishorder.php
- /*login.php
just to name a few, then you have the sorting and compare pages, they have to be blocked or a mess unfolds.
- Disallow: /*sort=newest
- Disallow: /*sort=bestselling
- Disallow: /*?page= ( Big duplicate page issue if you don't block this one with a wildcard, and cannot access your .htaccess file or the backend properly to noindex / nofollow )
Just to name a few, in my case, I only want the meat of the site to be indexed and rank for. Otherwise one client's site was ranking terms that more related to web development than the niche industry they lived in. Plus with a limited index budget, why would you want google or anyone else to crawl pages on your site with no SEO value towards your niche?
Unless you sold carts as in web developed carts for ecommerce sites you wouldn't want much of that indexed anyways, and even in that case, those pages aren't too useful for ranking. At least from what I've gathered in the niche industries.
-
Hi,
It sounds like you're going down the right path. Disallow and section of the site that has personal information, as there's no value in having bots crawl that, keep them on important content longer! In addition to Checkout and Basket/Cart, you should also disallow the My Account area if your site has one.
Your next grouping, I'm assuming these are the parameters by which you pages can be sorted. If so, yes, disallow all of those, they're only going to cause duplicate content flags for you in the future. I'm not sure which CMS you are using, but some eComm platforms also have 'email to a friend' URLs that are a major source for dupes and can often be identified and disallowed by another parameter.
Hope this helps narrow it down for you!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
My site shows 503 error to Google bot, but can see the site fine. Not indexing in Google. Help
Hi, This site is not indexed on Google at all. http://www.thethreehorseshoespub.co.uk Looking into it, it seems to be giving a 503 error to the google bot. I can see the site I have checked source code Checked robots Did have a sitemap param. but removed it for testing GWMT is showing 'unreachable' if I submit a site map or fetch Any ideas on how to remove this error? Many thanks in advance
Intermediate & Advanced SEO | | SolveWebMedia0 -
Moving to a new site while keeping old site live
For reasons I won't get into here, I need to move most of my site to a new domain (DOMAIN B) while keeping every single current detail on the old domain (DOMAIN A) as it is. Meaning, there will be 2 live websites that have mostly the same content, but I want the content to appear to search engines as though it now belongs to DOMAIN B. Weird situation. I know. I've run around in circles trying to figure out the best course of action. What do you think is the best way of going about this? Do I simply point DOMAIN A's canonical tags to the copied content on DOMAIN B and call it good? Should I ask sites that link to DOMAIN A to change their links to DOMAIN B, or start fresh and cut my losses? Should I still file a change of address with GWT, even though I'm not going to 301 redirect anything?
Intermediate & Advanced SEO | | kdaniels0 -
Best way to implement canonical tags on an ecommerce site with many filter options?
What would be the best way to add canonical tags to an ecommerce site with many filter options, for example, http://teacherexpress.scholastic.com? Should I include a canonical tag for all filter options under a category even though the pages don't have the same content? Thanks for reading!
Intermediate & Advanced SEO | | DA20130 -
Duplicate content on ecommerce sites
I just want to confirm something about duplicate content. On an eCommerce site, if the meta-titles, meta-descriptions and product descriptions are all unique, yet a big chunk at the bottom (featuring "why buy with us" etc) is copied across all product pages, would each page be penalised, or not indexed, for duplicate content? Does the whole page need to be a duplicate to be worried about this, or would this large chunk of text, bigger than the product description, have an effect on the page. If this would be a problem, what are some ways around it? Because the content is quite powerful, and is relavent to all products... Cheers,
Intermediate & Advanced SEO | | Creode0 -
Splitting a Site into Two Sites for SEO Purposes
I have a client that owns a business that really could be easily divided into two separate business in terms of SEO. Right now his web site covers both divisions of his business. He gets about 5500 visitors a month. The majority go to one part of his business and around 600 each month go to the other. So about 11% I'm considering breaking off this 11% and putting it on an entirely different domain name. I think I could rank better for this 11%. The site would only be SEO'd for this particular division of the company. The keywords would not be in competition with each other. I would of course link the two web sites and watch that I don't run into any duplicate content issues. I worry about placing the redirects from the pages that I remove to the new pages. I know Google is not a fan of redirects. Then I also worry about the eventual drop in traffic to the main site now. How big of a factor is traffic in rankings? Other challenges include that the business services 4 major metropolitan areas. Would you do this? Have you done this? How did it work? Any suggestions?
Intermediate & Advanced SEO | | MSWD0 -
Franchise sites on subdomains
I've been asked by a client to optimise a a webpage for a location i.e. London. Turns out that the location is actually a franchise of the main company. When the company launch a new franchise, so far they have simply added a new page to the main site, for example: mysite.co.uk/sub-folder/london They have so far done this for 10 or so franchises and task someone with optimising that page for their main keyword + location. I think I know the answer to this, but would like to get a back up / additional info on it in terms of ranking / seo benefits. I am going to suggest the idea of using a subdomain for each location, example: london.mysite.co.uk Would this be the correct approach. If you think yes, why? Many thanks,
Intermediate & Advanced SEO | | Webrevolve0 -
Using 2 wildcards in the robots.txt file
I have a URL string which I don't want to be indexed. it includes the characters _Q1 ni the middle of the string. So in the robots.txt can I use 2 wildcards in the string to take out all of the URLs with that in it? So something like /_Q1. Will that pickup and block every URL with those characters in the string? Also, this is not directly of the root, but in a secondary directory, so .com/.../_Q1. So do I have to format the robots.txt as //_Q1* as it will be in the second folder or just using /_Q1 will pickup everything no matter what folder it is on? Thanks.
Intermediate & Advanced SEO | | seo1234560