Should I block robots from URLs containing query strings?
-
I'm about to block off all URLs that have a query string using robots.txt. They're mostly URLs with coremetrics tags and other referrer info. I figured that search engines don't need to see these as they're always better off with the original URL.
Might there be any downside to this that I need to consider?
Appreciate your help / experiences on this one.
Thanks
Jenni
-
Thanks for your suggestions. I've already got canonical tags on every page, but they're not all being adhered to and lots of URLs with query strings are still getting organic traffic.
Passing referrer info behind scenes isn't an option with Coremetrics I don't think. Is it?
Interested to know more about number 1 though. How would you do that in WMT other than blocking with robots.txt?
Thanks
-
Instead of blocking them with robots.txt (which isn't very effective), try using the canonical tag instead.
For instance, a URL like this:
http://wwww.testdomain.com/page.html?utm_source=Google&utm_medium=Banner&utm_campaign=CampaignYou could add this canonical tag in the head:
With this solution you don't have to worry about losing quality links OR having your query tracking show up in any of the major search engines.
Cheers- Kyle
-
The downside to this would be if someone linked to the page with the query string, the search engines wouldn't crawl the page and flow link juice properly to the rest of your site.
Other options:
-
Use Google and Bing WMT to ignore those parameter query strings.
-
Make sure the canoncial tag is on those pages, pointing back to the version without the query string
-
Try to pass referrer info behind the scenes if possible
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do with existing URL when replatforming and new URL is the same?
We are changing CMS from WordPress to Uberflip. If there is a URL that remains the same I believe we should not create a redirect. However, what happens to the old page? Should it be deleted?
Technical SEO | | maland0 -
Trailing Slashes on URLs
Hi everyone I have a question on trailing slashes in URL. The crux of it is this: is having both: example.com/subdirectory/ and: example.com/subdirectory on all of your subdirectories considered duplicate content by Google - or in some other way really bad? We have done a heck a lot of research into this, and it would seem...no one knows for sure (it is easy to get lost in a sea of Webmaster tool forums from 2012). Google itself has both URLs for it's subdirectories (try https://www.google.co.uk/maps and https://www.google.co.uk/maps/) as does Moz; and yet there are some rumblings on the internet of people who think you must put a 'redirect' (although not really a redirect as it isn't a 301) in your htaccess file to one or the other (so for example.com/subdirectory/ would 'forward' to example.com/subdirectory); and this is what bbc.co.uk do. We tried putting this htaccess 'forward' in as an experiment, but I noticed our site then stopped being fully crawled by Google bot, so we reversed it. Can any one shed any light?
Technical SEO | | NickOrbital0 -
Robots file set up
The robots file looks like it has been set up in a very messy way.
Technical SEO | | mcwork
I understand the # will comment out a line, does this mean the sitemap would
not be picked up?
Disallow: /js/ should this be allowed like /*.js$
Disallow: /media/wysiwyg/ - this seems to be causing alerts in webmaster tools as it can not access
the images within.
Can anyone help me clean this up please #Sitemap: https://examplesite.com/sitemap.xml Crawlers Setup User-agent: *
Crawl-delay: 10 Allowable Index Mind that Allow is not an official standard Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/ Allow: /catalogsearch/result/ Allow: /media/catalog/ Directories Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/ Disallow: /media/ Disallow: /media/captcha/ Disallow: /media/catalog/ #Disallow: /media/css/
#Disallow: /media/css_secure/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
#Disallow: /media/js/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/ Paths (clean URLs) Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalog/product/gallery/
Disallow: */catalog/product/upload/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/ Files Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
Disallow: /get.php # Magento 1.5+ Paths (no clean URLs) #Disallow: /.js$
#Disallow: /.css$
Disallow: /.php$
Disallow: /?SID=
Disallow: /rss*
Disallow: /*PHPSESSID Disallow: /:
Disallow: /😘 User-agent: Fatbot
Disallow: / User-agent: TwengaBot-2.0
Disallow: /0 -
URL structure
Hello Guys, Quick Question regarding URL strucutre One of our client is an hotel chain, thye have a group site www.example.com and each property is located in a subfolder: www.example.com/example-boston.html , www.example.com/example-ny.html etc. My quesion is : where is better to place the language extension at a subfolder level?
Technical SEO | | travelclickseo
Should i go for www.example.com/en/example-ny.html or it is preferable to specify the language after the property name www.example.com/example-ny/en/accommodation.html? Thanks and Regards, Alessio0 -
Robots.txt
Hello, My client has a robots.txt file which says this: User-agent: * Crawl-delay: 2 I put it through a robots checker which said that it must have a **disallow command**. So should it say this: User-agent: * Disallow: crawl-delay: 2 What effect (if any) would not having a disallow command make? Thanks
Technical SEO | | AL123al0 -
Robots.txt and joomla
Hello, I use joomla for my website and automatically all those files are blocked is that good or bad, so I remove anything and if so why ? User-agent: *
Technical SEO | | seoanalytics
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ I also added to my robots.txt files my email address ( is that useful, I am afraid google passes PR to the email address )
and a javascript: void (0) because I have tabs on my webpage ( is that useful )
as well as a .pdf ( is it also useful ) any comments ? does anything need to be changed or is it ok ? Thank you,0 -
Google (GWT) says my homepage and posts are blocked by Robots.txt
I guys.. I have a very annoying issue.. My Wordpress-blog over at www.Trovatten.com has some indexation-problems.. Google Webmaster Tools data:
Technical SEO | | FrederikTrovatten22
GWT says the following: "Sitemap contains urls which are blocked by robots.txt." and shows me my homepage and my blogposts.. This is my Robots.txt: http://www.trovatten.com/robots.txt
"User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/ Do you have any idea why it says that the URL's are being blocked by robots.txt when that looks how it should?
I've read a couple of places that it can be because of a Wordpress Plugin that is creating a virtuel robots.txt, but I can't validate it.. 1. I have set WP-Privacy to crawl my site
2. I have deactivated all WP-plugins and I still get same GWT-Warnings. Looking forward to hear if you have an idea that might work!0 -
Is it terrible to not have robots.txt ?
I was under the impression that you really should have a robots.txt page, and not having one is pretty bad. However, hubspot (which I'm not impressed with) does not have the capability of properly implementing one. Will this hurt the site?
Technical SEO | | StandUpCubicles1