Robots Disallow Backslash - Is it right command
-
Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character
ex - www.xyz.com/\/index.php?option=com_product
www.xyz.com/\"/index.php?option=com_product
Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk
Need to know for command :-
User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site,
-
Thanks, you seem lucky to me.. Almost after 2 month i have got the code for making all these encoded url's redirect correctly. Finally, now if one types
http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10
then he's redirected through 301 to the correct url
http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10
-
Hello Gagan,
I think the best way to handle this would be using the rel canonical tag or rewriting the URLs to get rid of the parameters and replace them with something more user-friendly.
The rel canonical tag would be the easiest way out of those two. I notice the version without the encoding (e.g. http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 ) have a rel canonical tag that correctly references itself as the canonical version. However, the encoded URLs (e.g. http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10) which is actually http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10 does NOT have a rel canonical tag.
If the version with the backslash had a rel canonical tag stating that the following URL is canonical it would solve your issue, I think.
Canonical URL:
http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10 -
Sure, If i show you some url they are crawled as :-
Sample Incorrect URLs crawled and reported as duplicate one in Google Webmaster & Moz too
|
http://www.mycarhelpline.com/\"/index.php?option=com_latestnews&view=list&Itemid=10
| http://www.mycarhelpline.com/\"/index.php?option=com_newcar&view=category&Itemid=2 |
|
Correct URL
http://www.mycarhelpline.com/index.php?option=com_latestnews&view=list&Itemid=10
http://www.mycarhelpline.com/index.php?option=com_newcar&view=search&Itemid=2
What we found online
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces.
%22 reflects - " and %5c as \ (forward slash)
We intend to remove these duplicate one created having %22 and %5c within them..
Many thanks
-
I am not entirely sure I understood your question as intended, but I will do my best to answer.
I would not put this in my robots.txt flie because it could possibly be misunderstood as a forward slash, in which case your entire domain would be blocked:
Disallow: \
We can possibly provide you with some alternative suggestions on how to keep Google from crawling those pages if you could share some real examples.
It may be best to rewrite/redirect those URls instead since they don't seem to be the canonical version you intend to be presented to the user.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What do you add to your robots.txt on your ecommerce sites?
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Robots txt is case senstive? Pls suggest
Hi i have seen few urls in the html improvements duplicate titles Can i disable one of the below url in the robots.txt? /store/Solar-Home-UPS-1KV-System/75652
Intermediate & Advanced SEO | | Rahim119
/store/solar-home-ups-1kv-system/75652 if i disable this Disallow: /store/Solar-Home-UPS-1KV-System/75652 will the Search engines scan this /store/solar-home-ups-1kv-system/75652 im little confused with case senstive.. Pls suggest go ahead or not in the robots.txt0 -
Question about robots file on mobile devices
Hi We have a robots.txt file, but do I need to create a separate file for the m.site or can I just add the line into my normal robots file. Ive just read the Google Guidelines (what a great read it was) and couldn't find my answer. Thanks in Advance Andy
Intermediate & Advanced SEO | | Andy-Halliday0 -
When is the right time to invest in a Trusted SEO firm
My website www.dealwithautism.com is a 3 month old website. It currently has 50+ quality pages that are KW targeted and on page optimized (usually grade A on Moz page grader). Over the next 12 to 15 months, I plan add a total of 300 to 400 kw targeted pages to strive for topical authority. I am launching my first product (an ebook in the next couple of months) and would eventually move into a membership subscription model in next 15 month. I want to invest in a long term SEO strategy with a reputed and trusted SEO firm. Being just a 1 person show at he moment, my budget is small (about $250 a month) but over time, as I acquire more revenue I will increase my SEO budget accordingly. I believe, if I get traffic, my content has the guts to absorb engagement. From analytics, any page that is not bounced and has received organic traffic (only less than 10 per day though) has an average time spent > 12 mins. So my content seems to be doing its bit now. My question: Is now a good time to invest in SEO for my budget? I need a long term and natural seo strategy, no quick wins - happy to play by the CPC model for my money pages till I see an organic growth. Or should I wait for 5-6 more months to let my site age a bit and also y that time I should have 150+ quality pages, so the authority should be more.
Intermediate & Advanced SEO | | DealWithAutism0 -
Robots.txt assistance
I want to block all the inner archive news pages of my website in robots.txt - we don't have R&D capacity to set up rel=next/prev or create a central page that all inner pages would have a canonical back to, so this is the solution. The first page I want indexed reads:
Intermediate & Advanced SEO | | theLotter
http://www.xxxx.news/?p=1 all subsequent pages that I want blocked because they don't contain any new content read:
http://www.xxxx.news/?p=2
http://www.xxxx.news/?p=3
etc.... There are currently 245 inner archived pages and I would like to set it up so that future pages will automatically be blocked since we are always writing new news pieces. Any advice about what code I should use for this? Thanks!0 -
Robots.txt
What would be a perfect robots.txt file my site is propdental.es Can i just place: User-agent: * Or should i write something more???
Intermediate & Advanced SEO | | maestrosonrisas0 -
Blog/Shop/Forum site structure - are we right to make these changes?
We run a fairly large online community with a popular blog and Europe's largest online shop for drift-specific motor sport parts and our website has been around since 2004 I believe. Since it was launched, the blog (or previous CMS system) has been at the domain root, the forums have been located at /forum and the shop at /shop (or similar) but we have decided to move things around a bit and would like some comments as to whether we are doing the right thing or if you would make any addition or different changes to us. Currently the entire website gets around 3m page views per month from 500,000 visitors, but this is split roughly 75% to the forums, 10% to the shop and 15% to the blog (but remember the blog is at the root so anyone who visits our homepage "visits" the blog). We plan to move the shop to the domain root (since the shop provides the income for the business - surely it should be the 1st thing visitors see?), the blog from root to /blog and the forums will stay where they are at /forum. We have read Steven Macdonald's post here, and have taken notes to help minimize traffic loss and disruption to our army of users and hopefully avoid too many penalties from Google and plan to: 301 redirect old URLs to new ones where they have changed. Submit new site maps to search engines. Update old links where we have control (such as forums where we are paid traders etc.). Send out a newsletter to our subscribers. Update our forum members. Fix errors via WMT before and after the re-structure. Should we be taking this opportunity to actually set each of the three sections of the site to it's own sub domain? Our thoughts are that if we are disrupting things, it's surely best to have lots of disruption once rather than a little bit of disruption several times over a 3-6 month period? OSE shows us to have roughly 1500 inbound links to /shop, 2100 to /forum and 4800 to the root / - if we proceed with our plan and put 301 redirects in place this seems to be the best plan to retain the value of these links but if we were to switch to sub domains would the 301s lose most of the link values due to them being on "different" domains? Any help, advise or suggestions are very welcome but comments from experience are what we are seeking ideally! Thanks Jay
Intermediate & Advanced SEO | | DWJames0 -
Can you use more than one meta robots tag per page?
If you want to add both "noindex, follow" and "noopd" should you add two meta robots tags or is there a way to combine both into one?
Intermediate & Advanced SEO | | nicole.healthline0