Robots.txt: Syntax URL to disallow
-
Did someone ever experience some "collateral damages" when it's about "disallowing" some URLs?
Some old URLs are still present on our website and while we are "cleaning" them off the site (which takes time), I would like to to avoid their indexation through the robots.txt file.
The old URLs syntax is "/brand//13" while the new ones are "/brand/samsung/13." (note that there is 2 slash on the URL after the word "brand")
Do I risk to erase from the SERPs the new good URLs if I add to the robots.txt file the line "Disallow: /brand//" ?
I don't think so, but thank you to everyone who will be able to help me to clear this out
-
You could inadvertently block /brand/ altogether. Just because you use a // doesn't mean Google follows the same rules when crawling.
-
"I wouldn't risk telling a spider to ignore /brand// because it might have adverse results."
Which adverse results could be expected?
-
(because of the 404 error pages being constantly found in our pages)
Think of it this way:
Which is better? Re-routing traffic when it's congested or putting up a road block to back up even more traffic?Yes, it's more work to do the 301 redirects but if you have "pages being constantly found" you should give instructions to spiders to take the different path.
Now, if you are talking about an error such as:
/brand//samsung/13 SHOULD go to
/brand/samsung/13
Then you could EASILY solve this with HTACCESS redirects. I wouldn't risk telling a spider to ignore /brand// because it might have adverse results. -
Hi guys,
Thank you for your answers
I understand (and agree) with your SEO point of view (301 redirection) but I should have mentioned that these old URLs are leading to a 404 error page for a long time now, we are not considering anymore their SEO strength anymore...
My goal right now is to find a quick and simple way to tell search engines to not consider this type of old URLs (because of the 404 error pages being constantly found in our pages) : doing the 301 redirection to the right page would be a bit more complex at the moment.
So: do you think there is a risk that the second slash won't be "considered" in the robots.txt about the "disallow" line I want to add ? (= do search engines will stop to crawl URLs like "/brand/samsung/13" if I add the line "Disallow: /brand//" ?)
-
I'll further what Highland and Alex Chan are telling you. If you are using Apache (Linux) then you can redirect your old site links using a 301 redirect and .htaccess which is a very powerful tool. Otherwise, if you are using a IIS server, web.config is what you want to use.
A really good resource for .htassess is CSS-Tricks: http://css-tricks.com/snippets/htaccess/301-redirects/
-
Yup like Highland mentioned, using your robots.txt for this isn't a good idea. The robots.txt file isn't guaranteed to work anyway. The only sure fire way to get it working is to move all the URLs from the old structure to the new one, then 301 all the old URLs into the new URLs. The 301 minimizes loss to your SEO.
-
You really don't need a robots for that. I would either 301 the old URL (preferred) or have the old URL return a 404. Both will cause the old URL to be removed from the index. A robots nofollow simply leaves it up but tells the robots not to crawl it.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Parameters
Hi Moz Community, I'm working on a website that has URL parameters. After crawling the site, I've implemented canonical tags to all these URLs to prevent them from getting indexed by Google. However, today I've found out that Google has indexed plenty of URL parameters.. 1-Some of these URLs has canonical tags yet they are still indexed and live. 2- Some can't be discovered through site crawling and they are result in 5xx server error. Is there anything else that I can do (other than adding canonical tags) + how can I discover URL parameters indexed but not visible through site crawling? Thanks in advance!
Intermediate & Advanced SEO | | bbop330 -
301 vs Canonical - With A Side of Partial URL Rewrite and Google URL Parameters-OH MY
Hi Everyone, I am in the middle of an SEO contract with a site that is partially HTML pages and the rest are PHP and part of an ecommerce system for digital delivery of college classes. I am working with a web developer that has worked with this site for many years. In the php pages, there are also 6 different parameters that are currently filtered by Google URL parameters in the old Google Search Console. When I came on board, part of the site was https and the remainder was not. Our first project was to move completely to https and it went well. 301 redirects were already in place from a few legacy sites they owned so the developer expanded the 301 redirects to move everything to https. Among those legacy sites is an old site that we don't want visible, but it is extensively linked to the new site and some of our top keywords are branded keywords that originated with that site. Developer says old site can go away, but people searching for it are still prevalent in search. Biggest part of this project is now to rewrite the dynamic urls of the product pages and the entry pages to the class pages. We attempted to use 301 redirects to redirect to the new url and prevent the draining of link juice. In the end, according to the developer, it just isn't going to be possible without losing all the existing link juice. So its lose all the link juice at once (a scary thought) or try canonicals. I am told canonicals would work - and we can switch to that. My questions are the following: 1. Does anyone know of a way that might make the 301's work with the URL rewrite? 2. With canonicals and Google parameters, are we safe to delete the parameters after we have ensures everything has a canonical url (parameter pages included)? 3. If we continue forward with 301's and lose all the existing links, since this only half of the pages in the site (if you don't count the parameter pages) and there are only a few links per page if that, how much of an impact would it have on the site and how can I avoid that impact? 4. Canonicals seem to be recommended heavily these days, would the canonical urls be a better way to go than sticking with 301's. Thank you all in advance for helping! I sincerely appreciate any insight you might have. Sue (aka Trudy)
Intermediate & Advanced SEO | | TStorm1 -
How should you determine the preferred URL structure?
Hi Guys, When migrating to a new CMS which include new pages how should you determine the URL structure, specifically: So should we include www. or without it? Should the URL have a trailing slash? How would you determine the answer to these questions? Cheers.
Intermediate & Advanced SEO | | kayl870 -
How To Shorten Long URLS
Hi I want to shorten some URLs, if possible, that Moz is reporting as too long. They are all the same page but different categories - the page advertises jobs but the client requires various links to types of jobs on the menu. So the menu will have: Job type 1
Intermediate & Advanced SEO | | Ann64
Job type 2
Job Type 3 I'm getting the links by going to the page, clicking a dropdown to filter the Job type, then copying the resulting URL from the address bar. Bu these are really long & cumbersome. I presume if I used a URL shortener, this would count as redirects and alsonot be good for SEO. Any thoughts? Thanks
Ann0 -
URL Optimisation Dilemma
First of all, I fully appreciate that I may be over analysing this, so feel free to highlight if you think I’m going overboard on this one. I’m currently trying to optimise the URLs for a group of new pages that we have recently launched. I would usually err on the side of leaving the urls as they are so that any incoming links are not diluted through the 301 re-direct. In this case, however, there are very few links to these pages, so I don’t think that changing URLs will harm them. My main question is between short URLs vs. long URLs (I have already read Dr. Pete’s post on this). Note: the URLs I have listed below are not the actual URLs, but very similar examples that I have created. The URLs currently exist in a similar format to the examples below: http://www.company.com/products/dlm/hire-ca My first response was that we could put a few descriptive keywords in the url, with something like the following: http://www.company/products/debt-lifecycle-management/hire-collection-agents - I’m worried though that the URL will get too long for any pages sitting under this. As a compromise, I am considering the following: http://www.company/products/dlm/hire-collection-agents My feeling is that the second approach will give the best balance between having the keywords for the products and trying to ensure good user experience. My only concern is whether the /dlm/ category page would suffer slightly, but this would have ‘debt-lifecycle-management’ in the title tag. Does this sound like a good approach to people? Or do you think I’m being a little obsessive about this? Any help would be appreciated 🙂
Intermediate & Advanced SEO | | RG_SEO0 -
Capitals in URLs
Hello Mozzers. I've just been looking at a site with capitals in the URL - capitals are used in the product descriptions, so you'll have a URL structure like this: www.company.com/directory1/Double-Beds-Luxury (such URLs do not work if I lower the case of the capitals). There are 50,000 such products on the site. Clearly one drawback is potential customers might type in, or link to, the lower case of the URL and get a "not found" result (though the urls are relatively long so not that likely I'm thinking). Are there any additional drawbacks with the use of capitals outlined here?
Intermediate & Advanced SEO | | McTaggart0 -
Robots.txt
What would be a perfect robots.txt file my site is propdental.es Can i just place: User-agent: * Or should i write something more???
Intermediate & Advanced SEO | | maestrosonrisas0 -
How important is it to canonicalize mobile URLs to desktop URLs?
I know many SEO's prefer a stylesheet and single URL, but if you use m.domain.com, do you canonicalize to your desktop URLS?
Intermediate & Advanced SEO | | nicole.healthline0