Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Overly-Dynamic URL
-
Hi,
We have over 5000 pages showing under
Overly-Dynamic URL error
Our ecommerce site uses Ajax and we have several different filters like, Size, Color, Brand and we therefor have many different urls like,
http://www.dellamoda.com/Designer-Pumps.html?sort=price&sort_direction=1&use_selected_filter=Y
http://www.dellamoda.com/Designer-Accessories.html?sort=title&use_selected_filter=Y&view=all
http://www.dellamoda.com/designer-handbags.html?use_selected_filter=Y&option=manufacturer%3A&page3
Could we use the robots.txt file to disallow these from showing as duplicate content? and do we need to put the whole url in there?
like:
Disallow: /*?sort=price&sort_direction=1&use_selected_filter=Y
if not how far into the url should be disallowed?
So far we have added the following to our robots,txt
Disallow: /?sort=title
Disallow: /?use_selected_filter=Y
Disallow: /?sort=price
Disallow: /?clearall=Y
Just not sure if they are correct.
Any help would be greatly appreciated.
Thank you,Kami
-
Hi Kami,
It's unfortunate, but a number of modern day e-commerce platforms still suffer from poor canonicalisation and multiple URL's.
If possible, rather than blocking off access to those queries via robots.txt or meta, I'd start by trying to specify a canonical URL when a query is created.
E.G
Query: http://www.dellamoda.com/Designer-Accessories.html?sort=title&use_selected_filter=Y&view=all
Canonical: http://www.dellamoda.com/Designer-Accessories.htmlFailing that I'd try to implement a "follow,noindex" meta tag or via x-robots if you're any good with apache.
If that's still a no go, then try GWT, Google is getting much better at handling dynamic URL's within e-commerce platforms and you can specify which queries Google should ignore directly within GWT
There's a great post on Moz that deals with e-commerce and canonicalisation - http://www.seomoz.org/blog/qa-from-ecommerce-seo-fix-and-avoid-common-issues-webinar - I'd suggest starting there!
As a last resort I'd look to block the URL's within robots.txt, but this prevents crawlers from flowing freely within your site and can result in poor indexation.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO on dynamic website
Hi. I am hoping you can advise. I have a client in one of my training groups and their site is a golf booking engine where all pages are dynamically created based on parameters used in their website search. They want to know what is the best thing to do for SEO. They have some landing pages that Google can see but there is only a small bit of text at the top and the rest of the page is dynamically created. I have advised that they should create landing pages for each of their locations and clubs and use canonicals to handle what Google indexes.Is this the right advice or should they noindex? Thanks S
Intermediate & Advanced SEO | | bedynamic0 -
How important is the file extension in the URL for images?
I know that descriptive image file names are important for SEO. But how important is it to include .png, .jpg, .gif (or whatever file extension) in the url path? i.e. https://example.com/images/golden-retriever vs. https://example.com/images/golden-retriever.jpg Furthermore, since you can set the filename in the Content-Disposition response header, is there any need to include the descriptive filename in the URL path? Since I'm pulling most of our images from a database, it'd be much simpler to not care about simulating a filename, and just reference an image id in my templates. Example: 1. Browser requests GET /images/123456
Intermediate & Advanced SEO | | dsbud
2. Server responds with image setting both Content-Disposition, and Link (canonical) headers Content-Disposition: inline; filename="golden-retriever"
Link: <https: 123456="" example.com="" images="">; rel="canonical"</https:>1 -
Sanity Check: NoIndexing a Boatload of URLs
Hi, I'm working with a Shopify site that has about 10x more URLs in Google's index than it really ought to. This equals thousands of urls bloating the index. Shopify makes it super easy to make endless new collections of products, where none of the new collections has any new content... just a new mix of products. Over time, this makes for a ton of duplicate content. My response, aside from making other new/unique content, is to select some choice collections with KW/topic opportunities in organic and add unique content to those pages. At the same time, noindexing the other 90% of excess collections pages. The thing is there's evidently no method that I could find of just uploading a list of urls to Shopify to tag noindex. And, it's too time consuming to do this one url at a time, so I wrote a little script to add a noindex tag (not nofollow) to pages that share various identical title tags, since many of them do. This saves some time, but I have to be careful to not inadvertently noindex a page I want to keep. Here are my questions: Is this what you would do? To me it seems a little crazy that I have to do this by title tag, although faster than one at a time. Would you follow it up with a deindex request (one url at a time) with Google or just let Google figure it out over time? Are there any potential negative side effects from noindexing 90% of what Google is already aware of? Any additional ideas? Thanks! Best... Mike
Intermediate & Advanced SEO | | 945010 -
Sitemap generator which only includes canonical urls
Does anyone know of a 3rd party sitemap generator that will only include the canonical url's? Creating a sitemap with geo and sorting based parameters isn't the most ideal way to generate sitemaps. Please let me know if anyone has any ideas. Mind you we have hundreds of thousands of indexed url's and this can't be done with a simple text editor.
Intermediate & Advanced SEO | | recbrands0 -
Duplicate URLs ending with #!
Hi guys, Does anyone know why a site can contain duplicate URLs ending with hastag & exclamation mark e.g. https://site.com.au/#! We are finding a lot of these URLs (as duplicates) and i was wondering what they are from developer standpoint? And do you think it's worth the time and effort adding a rel canonical tag or 301 to these URLs eventhough they're not getting indexed by Google? Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
Attack of the dummy urls -- what to do?
It occurs to me that a malicious program could set up thousands of links to dummy pages on a website: www.mysite.com/dynamicpage/dummy123 www.mysite.com/dynamicpage/dummy456 etc.. How is this normally handled? Does a developer have to look at all the parameters to see if they are valid and if not, automatically create a 301 redirect or 404 not found? This requires a table lookup of acceptable url parameters for all new visitors. I was thinking that bad url names would be rare so it would be ok to just stop the program with a message, until I realized someone could intentionally set up links to non existent pages on a site.
Intermediate & Advanced SEO | | friendoffood1 -
Using Canonical URL to poin to an external page
I was wondering if I can use a canonical URL that points to a page residing on external site? So a page like:
Intermediate & Advanced SEO | | llamb
www.site1.com/whatever.html will have a canonical link in its header to www.site2.com/whatever.html. Thanks.0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1