Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Blocking Dynamic URLs with Robots.txt
-
Background:
My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page:
...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page:
http://www.mysite.com/widgets.html?price=1%2C250
http://www.mysite.com/widgets.html?price=2%2C250
http://www.mysite.com/widgets.html?price=3%2C250
As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations.
Question:
-
Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry.
-
To implement, I was going to do the following in Robots.txt:
User-agent: *
Disallow: /*?
Disallow: /*=
....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution?
Thank you!
-
-
If you are happy with any URLs with query strings not being indexed your robots.txt will work fine.
Do any or your URLs with question marks in them have links to them? If so you might want to be careful blocking google from indexing them. I would think you'd lose the benefits those links would pass to your site.
-
Tait,
Thanks for the answer. I think the canonical tag would be ideal, but in terms of implementation, it would require some substantial code modification to the site / PHP code as I have a lot of categories, and adding this manually to each one would be very time consuming.
Would preventing the spiders from indexing any URLs with a "?" or "&" (which would only be dynamic URLs variations) cause any problems? Or is this just not an ideal best practice?
Thanks!
-
I don't know if there's a good solution with robots.txt given your URL structure. However, you could use the rel=canonical link tag in the header to force google to treat many of your URLs the same way. This would help you avoid duplicate content penalties.
More on rel=canonical:
http://www.google.com/support/webmasters/bin/answer.py?answer=139394
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Rewriting Best Practices
Hey Moz! I’m getting ready to implement URL rewrites on my website to improve site structure/URL readability. More specifically I want to: Improve our website structure by removing redundant directories. Replace underscores with dashes and remove file extensions for our URLs. Please see my example below: Old structure: http://www.widgets.com/widgets/commercial-widgets/small_blue_widget.htm New structure: https://www.widgets.com/commercial-widgets/small-blue-widget I've read several URL rewriting guides online, all of which seem to provide similar but overall different methods to do this. I'm looking for what's considered best practices to implement these rewrites. From what I understand, the most common method is to implement rewrites in our .htaccess file using mod_rewrite (which will find the old URLs and rewrite them according to the rewrites I implement). One question I can't seem to find a definitive answer to is when I implement the rewrite to remove file extensions/replace underscores with dashes in our URLs, do the webpage file names need to be edited to the new format? From what I understand the webpage file names must remain the same for the rewrites in the .htaccess to work. However, our internal links (including canonical links) must be changed to the new URL format. Can anyone shed light on this? Also, I'm aware that implementing URL rewriting improperly could negatively affect our SERP rankings. If I redirect our old website directory structure to our new structure using this rewrite, are my bases covered in regards to having the proper 301 redirects in place to not affect our rankings negatively? Please offer any advice/reliable guides to handle this properly. Thanks in advance!
Intermediate & Advanced SEO | | TheDude0 -
Duplicate URLs ending with #!
Hi guys, Does anyone know why a site can contain duplicate URLs ending with hastag & exclamation mark e.g. https://site.com.au/#! We are finding a lot of these URLs (as duplicates) and i was wondering what they are from developer standpoint? And do you think it's worth the time and effort adding a rel canonical tag or 301 to these URLs eventhough they're not getting indexed by Google? Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
Robots.txt, does it need preceding directory structure?
Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow: /fish to robots.txt it will block /fish
Intermediate & Advanced SEO | | Milian
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything But would it block?: en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Changing a url from .html to .com
Hello, I have a client that has a site with a .html plugin and I have read that its best to not have this. We currently have pages ranking with this .html plug in. However If we take the plug in out will we lose rankings? would we need a 301 or something?
Intermediate & Advanced SEO | | SEODinosaur0 -
URL Structure for Directory Site
We have a directory that we're building and we're not sure if we should try to make each page an extension of the root domain or utilize sub-directories as users narrow down their selection. What is the best practice here for maximizing your SERP authority? Choice #1 - Hyphenated Architecture (no sub-folders): State Page /state/ City Page /city-state/ Business Page /business-city-state/
Intermediate & Advanced SEO | | knowyourbank
4) Location Page /locationname-city-state/ or.... Choice #2 - Using sub-folders on drill down: State Page /state/ City Page /state/city Business Page /state/city/business/
4) Location Page /locationname-city-state/ Again, just to clarify, I need help in determining what the best methodology is for achieving the greatest SEO benefits. Just by looking it would seem that choice #1 would work better because the URL's are very clear and SEF. But, at the same time it may be less intuitive for search. I'm not sure. What do you think?0 -
How do you implement dynamic SEO-friendly URLs using Ajax without using hashbangs?
We're building a new website platform and are using Ajax as the method for allowing users to select from filters. We want to dynamically insert elements into the URL as the filters are selected so that search engines will index multiple combinations of filters. We're struggling to see how this is possible using symfony framework. We've used www.gizmodo.com as an example of how to achieve SEO and user-friendly URLs but this is only an example of achieving this for static content. We would prefer to go down a route that didn't involve hashbangs if possible. Does anyone have any experience using hashbangs and how it affected their site? Any advice on the above would be gratefully received.
Intermediate & Advanced SEO | | Sayers1 -
Url with hypen or.co?
Given a choice, for your #1 keyword, would you pick a .com with one or two hypens? (chicago-real-estate.com) or a .co with the full name as the url (chicagorealestate.co)? Is there an accepted best practice regarding hypenated urls and/or decent results regarding the effectiveness of the.co? Thank you in advance!
Intermediate & Advanced SEO | | joechicago0