Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What do you add to your robots.txt on your ecommerce sites?
-
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following:
- Checkout
- Basket
Then possibly:
- Price
- Theme
- Sortby
- other misc filters.
What do you include?
-
I'm on this same path since we too cannot use noindex / nofollow due to limited backend interaction with Bigcommerce.
I like to block all cart related pages, which for ecommerce sites can be a boat load.
- /cart.php
- /checkout.php
- /finishorder.php
- /*login.php
just to name a few, then you have the sorting and compare pages, they have to be blocked or a mess unfolds.
- Disallow: /*sort=newest
- Disallow: /*sort=bestselling
- Disallow: /*?page= ( Big duplicate page issue if you don't block this one with a wildcard, and cannot access your .htaccess file or the backend properly to noindex / nofollow )
Just to name a few, in my case, I only want the meat of the site to be indexed and rank for. Otherwise one client's site was ranking terms that more related to web development than the niche industry they lived in. Plus with a limited index budget, why would you want google or anyone else to crawl pages on your site with no SEO value towards your niche?
Unless you sold carts as in web developed carts for ecommerce sites you wouldn't want much of that indexed anyways, and even in that case, those pages aren't too useful for ranking. At least from what I've gathered in the niche industries.
-
Hi,
It sounds like you're going down the right path. Disallow and section of the site that has personal information, as there's no value in having bots crawl that, keep them on important content longer! In addition to Checkout and Basket/Cart, you should also disallow the My Account area if your site has one.
Your next grouping, I'm assuming these are the parameters by which you pages can be sorted. If so, yes, disallow all of those, they're only going to cause duplicate content flags for you in the future. I'm not sure which CMS you are using, but some eComm platforms also have 'email to a friend' URLs that are a major source for dupes and can often be identified and disallowed by another parameter.
Hope this helps narrow it down for you!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Breaking up a site into multiple sites
Hi, I am working on plan to divide up mid-number DA website into multiple sites. So the current site's content will be divided up among these new sites. We can't share anything going forward because each site will be independent. The current homepage will change to just link out to the new sites and have minimal content. I am thinking the websites will take a hit in rankings but I don't know how much and how long the drop will last. I know if you redirect an entire domain to a new domain the impact is negligible but in this case I'm only redirecting parts of a site to a new domain. Say we rank #1 for "blue widget" on the current site. That page is going to be redirected to new site and new domain. How much of a drop can we expect? How hard will it be to rank for other new keywords say "purple widget" that we don't have now? How much link juice can i expect to pass from current website to new websites? Thank you in advance.
Intermediate & Advanced SEO | | timdavis0 -
Adult Toys Sites
Does anyone know of any changes SEOwise when running an adult toy site versus a normal eCommerce site? Is there any tips or suggestions that are worth knowing to achieve rankings faster? Thanks,
Intermediate & Advanced SEO | | the-gate-films0 -
Wildcarding Robots.txt for Particular Word in URL
Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon
Intermediate & Advanced SEO | | EvansHunt0 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
Add Videos Above or Below the Fold?
We are considering adding videos to thousands of article pages, and were wondering if it would be better to add video above or below the fold? They take up quite a bit of space, and push the article content below the fold--would this hurt us?
Intermediate & Advanced SEO | | nicole.healthline0 -
Using 2 wildcards in the robots.txt file
I have a URL string which I don't want to be indexed. it includes the characters _Q1 ni the middle of the string. So in the robots.txt can I use 2 wildcards in the string to take out all of the URLs with that in it? So something like /_Q1. Will that pickup and block every URL with those characters in the string? Also, this is not directly of the root, but in a secondary directory, so .com/.../_Q1. So do I have to format the robots.txt as //_Q1* as it will be in the second folder or just using /_Q1 will pickup everything no matter what folder it is on? Thanks.
Intermediate & Advanced SEO | | seo1234560 -
Block an entire subdomain with robots.txt?
Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
Intermediate & Advanced SEO | | kylesuss12 -
Robots.txt & url removal vs. noindex, follow?
When de-indexing pages from google, what are the pros & cons of each of the below two options: robots.txt & requesting url removal from google webmasters Use the noindex, follow meta tag on all doctor profile pages Keep the URLs in the Sitemap file so that Google will recrawl them and find the noindex meta tag make sure that they're not disallowed by the robots.txt file
Intermediate & Advanced SEO | | nicole.healthline0