Robots.txt for Facet Results
-
Hi
Does anyone know how to properly add facets URL's to Robots txt?
E.g. of our facets URL -
Everything after the # will need to be blocked on all pages with a facet.
Thank you
-
Great thank you!
-
This is the right answer.
Great way to check is to see if you have multiple versions of that URL indexed, which you don't: https://www.google.com/search?q=site:http://www.key.co.uk/en/key/platform-trolleys-trucks
-
Google ignores everything after the hash to start with, so you do not need to block it to finish with. It is a clever way to pass parameters without having to worry about Google getting lost.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Negative News in SERP results?
Hey guys, We did reputation management back in March 2017. We basically built high quality links to online assets such as linkedin, twitter, facebook, positive PR articles and other web properties in order to rank them higher then negative PR. However there was 0 change (the link building was solid). And the negative PR remains in the top 10 with also positive new articles about the site. At this point, i believe that Google is keeping the negative PR in the top 10 to keep balanced SERP results. Does anyone know if this is something Google does to balance positive and negative results? Cheers.
Intermediate & Advanced SEO | | cathywix0 -
How to get product info into Google Search Result box
Hi, in the last couple of weeks I get more and more search results with a product and prices of retailers below (see sample attached). Are there Schema parameters one could use to have a bigger chance to appear there? Thanks in advance Dieter Lang 0EYJtRJ
Intermediate & Advanced SEO | | Storesco1 -
Search Results Pages Blocked in Robots.txt?
Hi I am reviewing our robots.txt file. I wondered if search results pages should be blocked from crawling? We currently have this in the file /searchterm* Is it a good thing for SEO?
Intermediate & Advanced SEO | | BeckyKey0 -
SSL and robots.txt question - confused by Google guidelines
I noticed "Don’t block your HTTPS site from crawling using robots.txt" here: http://googlewebmastercentral.blogspot.co.uk/2014/08/https-as-ranking-signal.html Does this mean you can't use robots.txt anywhere on the site - even parts of a site you want to noindex, for example?
Intermediate & Advanced SEO | | McTaggart0 -
Duplicate site (disaster recovery) being crawled and creating two indexed search results
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain. Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm. There seem to be two potential fixes. Which is best for this case? use the robots.txt to block Google from crawling the .gtm site 2) canonicalize the the gtm urls to toptable.co.uk In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best. Thanks in advance to the SEOmoz community!
Intermediate & Advanced SEO | | OpenTable0 -
Disallow my store in robots.txt?
Should I disallow my store directory in robots.txt? Here is the URL: https://www.stdtime.com/store/ Here are my reasons for suggesting this: SEOMOZ finds crawl "errors" in there that I don't care about I don't think I care if the search engines index those pages I only have one product, and it is not an impulse buy My product has a 60 day sales cycle, so price is less important than features
Intermediate & Advanced SEO | | raywhite0 -
Techniques to fix eCommerce faceted navigation
Hi everyone, I've read a lot about different techniques to fix duplicate content problems caused by eCommerce faceted navigation (e.g. redundant URL combinations of colors, sizes, etc.). From what I've seen suggested methods include using AJAX or JavaScript to make the links functional for users only and prevent bots from crawling through them. I was wondering if this technique would work instead? If we detect that the user is a robot, instead of displaying a link, we simply display its anchor text. So what would be for a human COLOR < li > < a href = red >red < /a > < /li >
Intermediate & Advanced SEO | | anthematic
< li > < a href = blue>blue < /a > < /li > Would be for a robot COLOR < li > red < /li >
< li > blue < /li > Any reason I shouldn't do this? Thanks! *** edit Another reason to fix this is crawl budget since robots can waste their time going through every possible combination of facet. This is also something I'm looking to fix.0 -
How to enable crawling for dynamic generated search result pages?
I want to enable crawling facility for dynamic generated search result pages which are generating by Magento Solr search. You can view more about it by following URLs. http://code.google.com/p/magento-solr/ http://www.vistastores.com/catalogsearch/result/?q=bamboo+table+lamp
Intermediate & Advanced SEO | | CommercePundit
http://www.vistastores.com/catalogsearch/result/?q=ceramic+table+lamp
http://www.vistastores.com/catalogsearch/result/?q=green+patio+umbrella Right now, Google is not crawling search result page because, I have added following syntax to Robots.txt file. Disallow: /*?q= So, How do I enable crawling of search result pages with best SEO practice? If any other inputs in same direction so, it will help me more to get it done.0