What do you add to your robots.txt on your ecommerce sites?
-
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following:
- Checkout
- Basket
Then possibly:
- Price
- Theme
- Sortby
- other misc filters.
What do you include?
-
I'm on this same path since we too cannot use noindex / nofollow due to limited backend interaction with Bigcommerce.
I like to block all cart related pages, which for ecommerce sites can be a boat load.
- /cart.php
- /checkout.php
- /finishorder.php
- /*login.php
just to name a few, then you have the sorting and compare pages, they have to be blocked or a mess unfolds.
- Disallow: /*sort=newest
- Disallow: /*sort=bestselling
- Disallow: /*?page= ( Big duplicate page issue if you don't block this one with a wildcard, and cannot access your .htaccess file or the backend properly to noindex / nofollow )
Just to name a few, in my case, I only want the meat of the site to be indexed and rank for. Otherwise one client's site was ranking terms that more related to web development than the niche industry they lived in. Plus with a limited index budget, why would you want google or anyone else to crawl pages on your site with no SEO value towards your niche?
Unless you sold carts as in web developed carts for ecommerce sites you wouldn't want much of that indexed anyways, and even in that case, those pages aren't too useful for ranking. At least from what I've gathered in the niche industries.
-
Hi,
It sounds like you're going down the right path. Disallow and section of the site that has personal information, as there's no value in having bots crawl that, keep them on important content longer! In addition to Checkout and Basket/Cart, you should also disallow the My Account area if your site has one.
Your next grouping, I'm assuming these are the parameters by which you pages can be sorted. If so, yes, disallow all of those, they're only going to cause duplicate content flags for you in the future. I'm not sure which CMS you are using, but some eComm platforms also have 'email to a friend' URLs that are a major source for dupes and can often be identified and disallowed by another parameter.
Hope this helps narrow it down for you!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Speed Testing Tools For Production Sites
Hi Guys, Any free site speed testing tools for sites in production, which are password protected? We want to test site speed before the new site goes live on top priority pages. Site is on Shopify – we tried google page insights while being logged into the production site but believe its just recording the speed of the password page. Cheers.
Intermediate & Advanced SEO | | brandonegroup1 -
Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?
my site uses is set up at http://www.site.com I have my site redirected from non- www to the www in htacess file. My question is... what should my robots.txt file look like for the non-www site? Do you block robots from crawling the site like this? Or do you leave it blank? User-agent: * Disallow: / Sitemap: http://www.morganlindsayphotography.com/sitemap.xml Sitemap: http://www.morganlindsayphotography.com/video-sitemap.xml
Intermediate & Advanced SEO | | morg454540 -
Is my site being penalized?
I've gone through all the points on https://moz.com/blog/technical-site-audit-for-2015 but the site only ranks for its brand name after months. The website is not ranking in the top 100 for any main keywords (2,3,4 word phrases), only for a handful of very long phrases (4+). All of the content is unique, all pages are indexed, the website is fast and doesn't contain any crawl errors and there are a couple of links pointing to it. There is a sitewide follow link in the footer pointing to another domain, its parent company and vice-versa. This is not done for any SEO reasons but the companies are related and also the products are supplementary of each other. Could this be an issue? Or is my site being penalized by something else?
Intermediate & Advanced SEO | | Robbern0 -
Block subdomain directory in robots.txt
Instead of block an entire sub-domain (fr.sitegeek.com) with robots.txt, we like to block one directory (fr.sitegeek.com/blog).
Intermediate & Advanced SEO | | gamesecure
'fr.sitegeek.com/blog' and 'wwww.sitegeek.com/blog' contain the same articles in one language only labels are changed for 'fr' version and we suppose that duplicate content cause problem for SEO. We would like to crawl and index 'www.sitegee.com/blog' articles not 'fr.sitegeek.com/blog'. so, suggest us how to block single sub-domain directory (fr.sitegeek.com/blog) with robot.txt? This is only for blog directory of 'fr' version even all other directories or pages would be crawled and indexed for 'fr' version. Thanks,
Rajiv0 -
Pages getting into Google Index, blocked by Robots.txt??
Hi all, So yesterday we set up to Remove URL's that got into the Google index that were not supposed to be there, due to faceted navigation... We searched for the URL's by using this in Google Search.
Intermediate & Advanced SEO | | bjs2010
site:www.sekretza.com inurl:price=
site:www.sekretza.com inurl:artists= So it brings up a list of "duplicate" pages, and they have the usual: "A description for this result is not available because of this site's robots.txt – learn more." So we removed them all, and google removed them all, every single one. This morning I do a check, and I find that more are creeping in - If i take one of the suspecting dupes to the Robots.txt tester, Google tells me it's Blocked. - and yet it's appearing in their index?? I'm confused as to why a path that is blocked is able to get into the index?? I'm thinking of lifting the Robots block so that Google can see that these pages also have a Meta NOINDEX,FOLLOW tag on - but surely that will waste my crawl budget on unnecessary pages? Any ideas? thanks.0 -
How come this site does so well?
Hi Guys, It's bugging the crap out of me why this site does so well http://www.stagedinburgh.com/ when I look at it's link profile its so weak and terrible plus many links comes from the sites they own. Somehow the site out ranks many sites for search terms like edinburgh stag party, edinburgh stag do, edinburgh stag weekends. Am I missing something? They seem to only have links from 13 domains and they aint great. What am I missing?
Intermediate & Advanced SEO | | PottyScotty0 -
Moving from a static HTML CSS site with .html files to a Wordpress Site while keeping link structure
Mozzers, Hope this finds you well. I need some advice. We have a site built with a dreamweaver template, and it is lacking in responsiveness, ease of updates, and a lot of the coding is behind traditional web standards (which I know will start to hurt our rank - if not the user experience). For SEO purposes, we would like to move the existing static based site to Wordpress so we can update it easily and keep content fresh. Our current site, thriveboston.com, has a lot of page extensions ending in .html. For the transition, it is extremely important for us to keep the link structure. We rank well in the SERPs for Boston Counseling, etc... I found and tested a plugin (offline) that can add a .html extension to Wordpress pages, which allows us to keep our current structure, but has anyone had any luck with this live? Has anyone had any luck moving from a static site - to a Wordpress site - while keeping the current link structure - without hurting any rank? We hope to move soon because if the site continues to grow, it will become even harder to migrate the site over. Also, does anyone have any hesitations? It this a bad move? Should we just stay on the current DWT template (the HTML and CSS) and not migrate? Any suggestions and advice will be heeded. Thanks Mozzers!
Intermediate & Advanced SEO | | _Thriveworks0 -
Please review my site
Hi I hope that all is going well in Seattle! I just make this site and I would like to be judged! site is http://mangakaotaku.com I am open for recommendations and review. thanks
Intermediate & Advanced SEO | | nyanainc0