Block session id URLs with robots.txt
-
Hi,
I would like to block all URLs with the parameter '?filter=' from being crawled by including them in the robots.txt.
Which directive should I use:
User-agent: *
Disallow: ?filter=or
User-agent: *
Disallow: /?filter=In other words, is the forward slash in the beginning of the disallow directive necessary?
Thanks!
-
Hi Martijn,
Thanks for the answer. Regarding the forward slash in the beginning, is it necessary to use this?
In the robots text from Zalando for example, you can see that they don't use it for a lot of filters.
-
Uhh, that's not what the requester is looking for and could actually cause tons of problems if you would apply this on a site that you're unaware of. I would always go with the most limiting robots.txt that you can and in this case, I would go with: /?filter=
-
Hi,
The following should suffice as it will black any URL with a "?" in it
User-agent: * Disallow: /*?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why a certain URL ( a category URL ) disappears?
the page hasn't been spammed. - links are natural - onpage grader is perfect - there are useful high ranking articles linking to the page...pretty much everything is okay.....also all of my websites pages are okay and none of them has disappeared only this one ( the most important category of my site. )
Intermediate & Advanced SEO | | mohamadalieskandariii0 -
How Does Yelp Create URLs?
Hi all, How does Yelp (or other sites) go about creating URLs for just about every service and city possible ending with the search? in the URL like this https://www.yelp.com/search?cflt=chiropractors&find_loc=West+Palm+Beach%2C+FL. They clearly aren't creating all of these pages, so how do you go about setting a meta title/optimization formula that allows these pages to exist AND to be crawled by search engines and indexed?
Intermediate & Advanced SEO | | RickyShockley0 -
Is there a problems with putting encoding into the subdomain of a URL?
We are looking at changing our URL structure for tracking various affiliates from: https://sub.domain.com/quote/?affiliate_id=xxx to https://aff_xxx_affname.domain.com/quote/ Both would allow us to track affiliates, but the second would allow us to use cookies to track. Does anyone know if this could possibly cause SEO concerns? Also, For the site we want to rank for, we will use a reverse proxy to change the URL from https://aff_xxx.maindomain.com/quote/ to https://www.maindomain.com/quote/ would that cause any SEO issues. Thank you.
Intermediate & Advanced SEO | | RoxBrock0 -
Robots.txt Blocking - Best Practices
Hi All, We have a web provider who's not willing to remove the wildcard line of code blocking all agents from crawling our client's site (user-agent: *, Disallow: /). They have other lines allowing certain bots to crawl the site but we're wondering if they're missing out on organic traffic by having this main blocking line. It's also a pain because we're unable to set up Moz Pro, potentially because of this first line. We've researched and haven't found a ton of best practices regarding blocking all bots, then allowing certain ones. What do you think is a best practice for these files? Thanks! User-agent: * Disallow: / User-agent: Googlebot Disallow: Crawl-delay: 5 User-agent: Yahoo-slurp Disallow: User-agent: bingbot Disallow: User-agent: rogerbot Disallow: User-agent: * Crawl-delay: 5 Disallow: /new_vehicle_detail.asp Disallow: /new_vehicle_compare.asp Disallow: /news_article.asp Disallow: /new_model_detail_print.asp Disallow: /used_bikes/ Disallow: /default.asp?page=xCompareModels Disallow: /fiche_section_detail.asp
Intermediate & Advanced SEO | | ReunionMarketing0 -
Should I disallow all URL query strings/parameters in Robots.txt?
Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM
Intermediate & Advanced SEO | | jmorehouse
/Mulligan-Practitioner-CD-ROM?cat_id=87
/Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0 -
Robots.txt - blocking JavaScript and CSS, best practice for Magento
Hi Mozzers, I'm looking for some feedback regarding best practices for setting up Robots.txt file in Magento. I'm concerned we are blocking bots from crawling essential information for page rank. My main concern comes with blocking JavaScript and CSS, are you supposed to block JavaScript and CSS or not? You can view our robots.txt file here Thanks, Blake
Intermediate & Advanced SEO | | LeapOfBelief0 -
How to Block Google Preview?
Hi, Our site is very good for Javascript-On users, however many pages are loaded via AJAX and are inaccessible with JS-off. I'm looking to make this content available with JS-off so Search Engines can access them, however we don't have the Dev time to make them 'pretty' for JS-off users. The idea is to make them accessible with JS-off, but when requested by a user with JS-on the user is forwarded to the 'pretty' AJAX version. The content (text, images, links, videos etc) is exactly the same but it's an enormous amount of effort to make the JS-off version 'pretty' and I can't justify the development time to do this. The problem is that Googlebot will index this page and show a preview of the ugly JS-off page in the preview on their results - which isn't good for the brand. Is there a way or meta code that can be used to stop the preview but still have it cached? My current options are to use the meta noarchive or "Cache-Control" content="no-cache" to ask Google to stop caching the page completely, but wanted to know if there was a better way of doing this? Any ideas guys and girls? Thanks FashionLux
Intermediate & Advanced SEO | | FashionLux0 -
202 error page set in robots.txt versus using crawl-able 404 error
We currently have our error page set up as a 202 page that is unreachable by the search engines as it is currently in our robots.txt file. Should the current error page be a 404 error page and reachable by the search engines? Is there more value or is it a better practice to use 404 over a 202? We noticed in our Google Webmaster account we have a number of broken links pointing the site, but the 404 error page was not accessible. If you have any insight that would be great, if you have any questions please let me know. Thanks, VPSEO
Intermediate & Advanced SEO | | VPSEO0