Blocking Dynamic URLs with Robots.txt
-
Background:
My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page:
...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page:
http://www.mysite.com/widgets.html?price=1%2C250
http://www.mysite.com/widgets.html?price=2%2C250
http://www.mysite.com/widgets.html?price=3%2C250
As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations.
Question:
-
Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry.
-
To implement, I was going to do the following in Robots.txt:
User-agent: *
Disallow: /*?
Disallow: /*=
....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution?
Thank you!
-
-
If you are happy with any URLs with query strings not being indexed your robots.txt will work fine.
Do any or your URLs with question marks in them have links to them? If so you might want to be careful blocking google from indexing them. I would think you'd lose the benefits those links would pass to your site.
-
Tait,
Thanks for the answer. I think the canonical tag would be ideal, but in terms of implementation, it would require some substantial code modification to the site / PHP code as I have a lot of categories, and adding this manually to each one would be very time consuming.
Would preventing the spiders from indexing any URLs with a "?" or "&" (which would only be dynamic URLs variations) cause any problems? Or is this just not an ideal best practice?
Thanks!
-
I don't know if there's a good solution with robots.txt given your URL structure. However, you could use the rel=canonical link tag in the header to force google to treat many of your URLs the same way. This would help you avoid duplicate content penalties.
More on rel=canonical:
http://www.google.com/support/webmasters/bin/answer.py?answer=139394
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt - Googlebot - Allow... what's it for?
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke User-Agent: Googlebot Allow: /.js Allow: /.css
Intermediate & Advanced SEO | | McTaggart0 -
Robots.txt
Hi all, Happy New Year! I want to block certain pages on our site as they are being flagged (according to my Moz Crawl Report) as duplicate content when in fact that isn't strictly true, it is more to do with the problems faced when using a CMS system... Here are some examples of the pages I want to block and underneath will be what I believe to be the correct robots.txt entry... http://www.XYZ.com/forum/index.php?app=core&module=search&do=viewNewContent&search_app=members&search_app_filters[forums][searchInKey]=&period=today&userMode=&followedItemsOnly= Disallow: /forum/index.php?app=core&module=search http://www.XYZ.com/forum/index.php?app=core&module=reports&rcom=gallery&imageId=980&ctyp=image Disallow: /forum/index.php?app=core&module=reports http://www.XYZ.com/forum/index.php?app=forums&module=post§ion=post&do=reply_post&f=146&t=741&qpid=13308 Disallow: /forum/index.php?app=forums&module=post http://www.XYZ.com/forum/gallery/sizes/182-promenade/small/ http://www.XYZ.com/forum/gallery/sizes/182-promenade/large/ Disallow: /forum/gallery/sizes/ Any help \ advice would be much appreciated. Many thanks Andy
Intermediate & Advanced SEO | | TomKing0 -
Product or Shop in URL
What do you think is better for seo and for sale, I am using woo-ecommerce for health products website. websitename.com/product/keyword OR websitename.com/shop/keyword
Intermediate & Advanced SEO | | MasonBaker0 -
Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.
An example URL can be found here: http://symptom.healthline.com/symptomsearch?addterm=Neck%20pain&addterm=Face&addterm=Fatigue&addterm=Shortness%20Of%20Breath A couple of questions: Why is Google reporting an issue with these URLs if they are marked as noindex? What is the best way to fix the issue? Thanks in advance.
Intermediate & Advanced SEO | | nicole.healthline0 -
Good idea? Dynamic to Perkalink URLs for over 10k pages.
Hey guys, We're looking for someone who has already done this. We know permalinks are SEO best practices but changing site stucture can cause alot of ranking and traffic loss. Is it a wise idea to make the transition? How long will we see a dip in rankings before bouncing back?
Intermediate & Advanced SEO | | SherisRanch0 -
Should I change wordpress urls?
Should I change my wordpress permalinks to include the keyword? For examples at the minute my url is http://www.musicliveuk.com/home/wedding-singer. Is it better to be http://www.musicliveuk.com/live-bands/wedding-singer. 'home' is not relevant so surely 'live-bands' would be better? If I change the urls won't I lose 'link juice' as external links will all point to a url that no longer exists? Or will wordpress automatically redirect the old url to the new one? Finally, if I should change the url as described how do I do it on wordpress? I can only see how to edit the last bit of the url and not the middle bit.
Intermediate & Advanced SEO | | SamCUK0 -
Block all but one URL in a directory using robots.txt?
Is it possible to block all but one URL with robots.txt? for example domain.com/subfolder/example.html, if we block the /subfolder/ directory we want all URLs except for the exact match url domain.com/subfolder to be blocked.
Intermediate & Advanced SEO | | nicole.healthline0 -
Best Product URL For Indexing
My proposed URL: mydomain.com/products/category/subcategory/product detail Puts my products 4 levels deep. Is this too deep to get my products indexed?
Intermediate & Advanced SEO | | waynekolenchuk0