What do you add to your robots.txt on your ecommerce sites?
-
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following:
- Checkout
- Basket
Then possibly:
- Price
- Theme
- Sortby
- other misc filters.
What do you include?
-
I'm on this same path since we too cannot use noindex / nofollow due to limited backend interaction with Bigcommerce.
I like to block all cart related pages, which for ecommerce sites can be a boat load.
- /cart.php
- /checkout.php
- /finishorder.php
- /*login.php
just to name a few, then you have the sorting and compare pages, they have to be blocked or a mess unfolds.
- Disallow: /*sort=newest
- Disallow: /*sort=bestselling
- Disallow: /*?page= ( Big duplicate page issue if you don't block this one with a wildcard, and cannot access your .htaccess file or the backend properly to noindex / nofollow )
Just to name a few, in my case, I only want the meat of the site to be indexed and rank for. Otherwise one client's site was ranking terms that more related to web development than the niche industry they lived in. Plus with a limited index budget, why would you want google or anyone else to crawl pages on your site with no SEO value towards your niche?
Unless you sold carts as in web developed carts for ecommerce sites you wouldn't want much of that indexed anyways, and even in that case, those pages aren't too useful for ranking. At least from what I've gathered in the niche industries.
-
Hi,
It sounds like you're going down the right path. Disallow and section of the site that has personal information, as there's no value in having bots crawl that, keep them on important content longer! In addition to Checkout and Basket/Cart, you should also disallow the My Account area if your site has one.
Your next grouping, I'm assuming these are the parameters by which you pages can be sorted. If so, yes, disallow all of those, they're only going to cause duplicate content flags for you in the future. I'm not sure which CMS you are using, but some eComm platforms also have 'email to a friend' URLs that are a major source for dupes and can often be identified and disallowed by another parameter.
Hope this helps narrow it down for you!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Not sure how we're blocking homepage in robots.txt; meta description not shown
Hi folks! We had a question come in from a client who needs assistance with their robots.txt file. Metadata for their homepage and select other pages isn't appearing in SERPs. Instead they get the usual message "A description for this result is not available because of this site's robots.txt – learn more". At first glance, we're not seeing the homepage or these other pages as being blocked by their robots.txt file: http://www.t2tea.com/robots.txt. Does anyone see what we can't? Any thoughts are massively appreciated! P.S. They used wildcards to ensure the rules were applied for all locale subdirectories, e.g. /en/au/, /en/us/, etc.
Intermediate & Advanced SEO | | SearchDeploy0 -
Huge increase in server errors and robots.txt
Hi Moz community! Wondering if someone can help? One of my clients (online fashion retailer) has been receiving huge increase in server errors (500's and 503's) over the last 6 weeks and it has got to the point where people cannot access the site because of server errors. The client has recently changed hosting companies to deal with this, and they have just told us they removed the DNS records once the name servers were changed, and they have now fixed this and are waiting for the name servers to propagate again. These errors also correlate with a huge decrease in pages blocked by robots.txt file, which makes me think someone has perhaps changed this and not told anyone... Anyone have any ideas here? It would be greatly appreciated! 🙂 I've been chasing this up with the dev agency and the hosting company for weeks, to no avail. Massive thanks in advance 🙂
Intermediate & Advanced SEO | | labelPR0 -
How are these sites ranking!?!
One of our clients is in the insurance industry and over the last 12 months we have seen an increasing number of low quality, newly registered, spammy sites achieving top 5 rankings for major keywords, which in turn is having an adverse effect on the rankings for our client. Does anyone have any idea how the following sites have managed to do this: http://www.multiquotetaxi.co.uk/ - 2nd for taxi insurance http://www.motortradefast.co.uk/ - 1st for motor trade insurance http://www.traders-insurance.com/ - 3rd for motor trade insurance http://www.multiquotefleet.co.uk/ - 1st for fleet insurance We have tried reporting the above sites, tried holding out to see if they get penalised and tried figuring out what they have done ourselves but cannot see how they have managed it. Any ideas at all?
Intermediate & Advanced SEO | | instinctive0 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
Robots.txt, does it need preceding directory structure?
Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow: /fish to robots.txt it will block /fish
Intermediate & Advanced SEO | | Milian
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything But would it block?: en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0 -
Ecommerce website consolidation
I have a large ecommerce site and several smaller nitche ecommerce sites. All have the same products, but the smaller sites are loosing traffic. I want to combine all the sites to the larger site so it will be easier to manage, but I don't want to loose any rank on the smaller sites. Example: www.yourpromopeople.com - This is the large site I want to use. www.logocoolies.com www.fourcolormagnets.com - These are a couple of the smaller sites I want to combine with the larger one. Questions: What are the pros and cons in doing this? What would be the best way to do this? Would redirecting the URL's to the larger site's product pages do the trick or is there a better option? Thanks for the help.
Intermediate & Advanced SEO | | JHSpecialty0 -
On-Site Optimization Tips for Job site?
I am working on a job site that only ranks well for the homepage with very low ranking internal pages. My job pages do not rank what so ever and are database driven and often times turn to 404 pages after the job has been filled. The job pages have to no content either. Anybody have any technical on-site recommendations for a job site I am working on especially regarding my internal pages? (Cross Country Allied.com)
Intermediate & Advanced SEO | | Melia0 -
SEO for Log in Sites
Hello, I just lunched a website where you have to sign up and to log in in order to use it. So I have the home, also a blog but then the rest of the pages are let's say it "hidden".How would you do the seo for it? I have been cheking facebook, foursquare and some others and they use different approaches. Facebook uses the same description in every single page for example. My site is similar to foursquare users have profile, stats, history, ranking. Well, what is your advice?? Thanks a lot
Intermediate & Advanced SEO | | antorome0