Blocking Dynamic URLs with Robots.txt
-
Background:
My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page:
...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page:
http://www.mysite.com/widgets.html?price=1%2C250
http://www.mysite.com/widgets.html?price=2%2C250
http://www.mysite.com/widgets.html?price=3%2C250
As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations.
Question:
-
Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry.
-
To implement, I was going to do the following in Robots.txt:
User-agent: *
Disallow: /*?
Disallow: /*=
....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution?
Thank you!
-
-
If you are happy with any URLs with query strings not being indexed your robots.txt will work fine.
Do any or your URLs with question marks in them have links to them? If so you might want to be careful blocking google from indexing them. I would think you'd lose the benefits those links would pass to your site.
-
Tait,
Thanks for the answer. I think the canonical tag would be ideal, but in terms of implementation, it would require some substantial code modification to the site / PHP code as I have a lot of categories, and adding this manually to each one would be very time consuming.
Would preventing the spiders from indexing any URLs with a "?" or "&" (which would only be dynamic URLs variations) cause any problems? Or is this just not an ideal best practice?
Thanks!
-
I don't know if there's a good solution with robots.txt given your URL structure. However, you could use the rel=canonical link tag in the header to force google to treat many of your URLs the same way. This would help you avoid duplicate content penalties.
More on rel=canonical:
http://www.google.com/support/webmasters/bin/answer.py?answer=139394
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Disallowed "Search" results with robots.txt and Sessions dropped
Hi
Intermediate & Advanced SEO | | Frankie-BTDublin
I've started working on our website and I've found millions of "Search" URL's which I don't think should be getting crawled & indexed (e.g. .../search/?q=brown&prefn1=brand&prefv1=C.P. COMPANY|AERIN|NIKE|Vintage Playing Cards|BIALETTI|EMMA PAKE|QUILTS OF DENMARK|JOHN ATKINSON|STANCE|ISABEL MARANT ÉTOILE|AMIRI|CLOON KEEN|SAMSONITE|MCQ|DANSE LENTE|GAYNOR|EZCARAY|ARGOSY|BIANCA|CRAFTHOUSE|ETON). I tried to disallow them on the Robots.txt file, but our Sessions dropped about 10% and our Average Position on Search Console dropped 4-5 positions over 1 week. Looks like over 50 Million URL's have been blocked, and all of them look like all of them are like the example above and aren't getting any traffic to the site. I've allowed them again, and we're starting to recover. We've been fixing problems with getting the site crawled properly (Sitemaps weren't added correctly, products blocked from spiders on Categories pages, canonical pages being blocked from Crawlers in robots.txt) and I'm thinking Google were doing us a favour and using these pages to crawl the product pages as it was the best/only way of accessing them. Should I be blocking these "Search" URL's, or is there a better way about going about it??? I can't see any value from these pages except Google using them to crawl the site.0 -
Robots.txt advice
Hey Guys, Have you ever seen coding like this in a robots.txt, I have never seen a noindex rule in a robots.txt file before - have you? user-agent: AhrefsBot User-agent: trovitBot
Intermediate & Advanced SEO | | eLab_London
User-agent: Nutch
User-agent: Baiduspider
Disallow: / User-agent: *
Disallow: /WebServices/
Disallow: /*?notfound=
Disallow: /?list=
Noindex: /?*list=
Noindex: /local/
Disallow: /local/
Noindex: /handle/
Disallow: /handle/
Noindex: /Handle/
Disallow: /Handle/
Noindex: /localsites/
Disallow: /localsites/
Noindex: /search/
Disallow: /search/
Noindex: /Search/
Disallow: /Search/
Disallow: ? I have never seen a noindex rule in a robots.txt file before - have you?
Any pointers?0 -
Should I disallow all URL query strings/parameters in Robots.txt?
Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM
Intermediate & Advanced SEO | | jmorehouse
/Mulligan-Practitioner-CD-ROM?cat_id=87
/Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0 -
Robots.txt
Hi all, Happy New Year! I want to block certain pages on our site as they are being flagged (according to my Moz Crawl Report) as duplicate content when in fact that isn't strictly true, it is more to do with the problems faced when using a CMS system... Here are some examples of the pages I want to block and underneath will be what I believe to be the correct robots.txt entry... http://www.XYZ.com/forum/index.php?app=core&module=search&do=viewNewContent&search_app=members&search_app_filters[forums][searchInKey]=&period=today&userMode=&followedItemsOnly= Disallow: /forum/index.php?app=core&module=search http://www.XYZ.com/forum/index.php?app=core&module=reports&rcom=gallery&imageId=980&ctyp=image Disallow: /forum/index.php?app=core&module=reports http://www.XYZ.com/forum/index.php?app=forums&module=post§ion=post&do=reply_post&f=146&t=741&qpid=13308 Disallow: /forum/index.php?app=forums&module=post http://www.XYZ.com/forum/gallery/sizes/182-promenade/small/ http://www.XYZ.com/forum/gallery/sizes/182-promenade/large/ Disallow: /forum/gallery/sizes/ Any help \ advice would be much appreciated. Many thanks Andy
Intermediate & Advanced SEO | | TomKing0 -
Robots.txt help
Hi Moz Community, Google is indexing some developer pages from a previous website where I currently work: ddcblog.dev.examplewebsite.com/categories/sub-categories Was wondering how I include these in a robots.txt file so they no longer appear on Google. Can I do it under our homepage GWT account or do I have to have a separate account set up for these URL types? As always, your expertise is greatly appreciated, -Reed
Intermediate & Advanced SEO | | IceIcebaby0 -
URL for offline use.
Hi there, We currently have a url www.example.com/health/back-pain/ We are wanting to promote this page on our product packaging however making the URL simpler www.example.com/back-pain/ is it just a case of using a 301? are there any issues here? Thanks for any feedback
Intermediate & Advanced SEO | | Paul780 -
Optimising a Dynamic website ?
A client has bought the Nostalgia wp theme. I've installed Yoast but because the website is ajax based and the content for the pages are dynamically loaded the plugin won't work. Or at least not to my knowledge? The developer doesn't currently have a solution, which from previous expereience it will never be supported. So I need some possible solutions here. Create a mobile site? Cons more time, more money etc Create non dynamic pages linked in footer area. Cons page duplication etc. It's a small niche so having the basic elements is imperative to getting it ranking.
Intermediate & Advanced SEO | | StephenForde0 -
Why should I add URL parameters where Meta Robots NOINDEX available?
Today, I have checked Bing webmaster tools and come to know about Ignore URL parameters. Bing webmaster tools shows me certain parameters for URLs where I have added META Robots with NOINDEX FOLLOW syntax. I can see canopy_search_fabric parameter in suggested section. It's due to following kind or URLs. http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1728 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1729 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=1730 http://www.vistastores.com/patio-umbrellas?canopy_fabric_search=2239 But, I have added META Robots NOINDEX Follow to disallow crawling. So, why should it happen?
Intermediate & Advanced SEO | | CommercePundit0