Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I block robots from URLs containing query strings?
-
I'm about to block off all URLs that have a query string using robots.txt. They're mostly URLs with coremetrics tags and other referrer info. I figured that search engines don't need to see these as they're always better off with the original URL.
Might there be any downside to this that I need to consider?
Appreciate your help / experiences on this one.
Thanks
Jenni
-
Thanks for your suggestions. I've already got canonical tags on every page, but they're not all being adhered to and lots of URLs with query strings are still getting organic traffic.
Passing referrer info behind scenes isn't an option with Coremetrics I don't think. Is it?
Interested to know more about number 1 though. How would you do that in WMT other than blocking with robots.txt?
Thanks
-
Instead of blocking them with robots.txt (which isn't very effective), try using the canonical tag instead.
For instance, a URL like this:
http://wwww.testdomain.com/page.html?utm_source=Google&utm_medium=Banner&utm_campaign=CampaignYou could add this canonical tag in the head:
With this solution you don't have to worry about losing quality links OR having your query tracking show up in any of the major search engines.
Cheers- Kyle
-
The downside to this would be if someone linked to the page with the query string, the search engines wouldn't crawl the page and flow link juice properly to the rest of your site.
Other options:
-
Use Google and Bing WMT to ignore those parameter query strings.
-
Make sure the canoncial tag is on those pages, pointing back to the version without the query string
-
Try to pass referrer info behind the scenes if possible
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Good robots txt for magento
Dear Communtiy, I am trying to improve the SEO ratings for my website www.rijwielcashencarry.nl (magento). My next step will be implementing robots txt to exclude some crawling pages.
Technical SEO | | rijwielcashencarry040
Does anybody have a good magento robots txt for me? And what need i copy exactly? Thanks everybody! Greetings, Bob0 -
Vanity URLs are being indexed in Google
We are currently using vanity URLs to track offline marketing, the vanity URL is structured as www.clientdomain.com/publication, this URL then is 302 redirected to the actual URL on the website not a custom landing page. The resulting redirected URL looks like: www.clientdomain.com/xyzpage?utm_source=print&utm_medium=print&utm_campaign=printcampaign. We have started to notice that some of the vanity URLs are being indexed in Google search. To prevent this from happening should we be using a 301 redirect instead of a 302 and will the Google index ignore the utm parameters in the URL that is being 301 redirect to? If not, any suggestions on how to handle? Thanks,
Technical SEO | | seogirl221 -
How do I deindex url parameters
Google indexed a bunch of our URL parameters. I'm worried about duplicate content. I used the URL parameter tool in webmaster to set it so future parameters don't get indexed. What can I do to remove the ones that have already been indexed? For example, Site.com/products and site.com/products?campaign=email have both been indexed as separate pages even though they are the same page. If I use a no index I'm worried about de indexing the product page. What can I do to just deindexed the URL parameter version? Thank you!
Technical SEO | | BT20090 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Duplicate Content and URL Capitalization
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input. A couple examples are: www.househitz.com/Pennsylvania/Houses-for-sale www.househitz.com/Pennsylvania/houses-for-sale www.househitz.com/Pennsylvania/Houses-for-rent www.househitz.com/Pennsylvania/houses-for-rent There are currently thousands of instances of this on the site. Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
Technical SEO | | Jom0 -
Block Baidu crawler?
Hello! One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk. Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site? What do you suggest?
Technical SEO | | AJPro0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0