Robots.txt - What is the correct syntax?
-
Hello everyone
I have the following link:
http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167
I want to prevent google from indiexing everything that is related to "view=send_friend"
The problem is that its giving me dublicate content, and the content of the links has no SEO value of any sort.
My problem is how i disallow it correctly via robots.txt
I tried this syntax:
Disallow: /view=send_friend/
However after doing a crawl on request the 200+ dublicate links that contains view=send_friend is still present in the CSV crawl report.
What is the correct syntax if i want to prevent google from indexing everything that is related to this kind of link?
-
I added your suggestion to robots.txt and requested a crawl again.
I only have 3 pages with dublicate page content now
So your suggestion seemes to have worked.
Thanks for your reply.. it worked!
-
you are right. misinterpreted the explanation. Apologies
-
Jarno,
The $ would suggest this parameter is always on the end of a URL. And within Henrik's example it's already somewhere in the middle of the URL.
-
Henrik,
i think you should be looking into something like this:
User-agent: Googlebot
Disallow: /*view=send_friend$hope this helps
Kind regards
Jarno
-
Hi Henrik,
I would suggest trying: Disallow: &view=send_friend
Optional you could try this without the & as I'm not sure this is always at the start of this parameter.Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0 -
Correct Way to Write Meta
OK so this is a really, really basic question. However, I'm seeing some meta written differently to normal and I'm wondering if a) this is correct and b) whether there is any benefit. Normally it's like this: However, I am seeing it written like this is some places: So, the content= and name= are swapped around. I assume the people that did this were thinking that bringing the content forward would mean that Google reads keywords first. Just wondering if anybody knows whether this is good practice or not? Just spiked my interest so apologies for the basic nature of the question!
Technical SEO | | RiceMedia0 -
Getting home page content at top of what robots see
When I click on the text-only cache of nlpca(dot)com on the home page http://webcache.googleusercontent.com/search?q=cache:UIJER7OJFzYJ:www.nlpca.com/&hl=en&gl=us&strip=1 our H1 and body content are at the very bottom. How do we get the h1 and content at the top of what the robots see? Thanks!
Technical SEO | | BobGW0 -
Search engines have been blocked by robots.txt., how do I find and fix it?
My client site royaloakshomesfl.com is coming up in my dashboard as having Search engines have been blocked by robots.txt, only I have no idea where to find it and fix the problem. Please help! I do have access to webmaster tools and this site is a WP site, if that helps.
Technical SEO | | LeslieVS0 -
Backtracking from verification meta tag to the correct Google account is difficult
A Google verification meta tag was created and implemented on a site that I am now responsible for (I took over an SEO project after a long lapse), but no one seems to know what Google account was used to create the meta tag in the first place. I'm finding it very difficult to backtrack from verification meta tag to the Google account, and all the online help is for those having trouble moving forward with the verification. Any suggestions or advice?
Technical SEO | | MaryDoherty0 -
Quick robots.txt check
We're working on an SEO update for http://www.gear-zone.co.uk at the moment, and I was wondering if someone could take a quick look at the new robots file (http://gearzone.affinitynewmedia.com/robots.txt) to make sure we haven't missed anything? Thanks
Technical SEO | | neooptic0 -
Subdomain Robots.txt
If I have a subdomain (a blog) that is having tags and categories indexed when they should not be, because they are creating duplicate content. Can I block them using a robots.txt file? Can I/do I need to have a separate robots file for my subdomain? If so, how would I format it? Do I need to specify that it is a subdomain robots file, or will the search engines automatically pick this up? Thanks!
Technical SEO | | JohnECF0 -
How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?
Today's sitemap webinar made me think about the disallow feature, seems opposite of sitemaps, but it also seems both are kind of ignored in varying ways by the engines. I don't need help semantically, I got that part. I just can't seem to find a contemporary answer about what should be blocked using the robots.txt file. For example, I have folders containing site comps for clients that I really don't want showing up in the SERPS. Is it better to not have these folders on the domain at all? There are also security issues I've heard of that make sense, simply look at a site's robots file to see what they are hiding. It makes it easier to hunt for files when they know the directory the files are contained in. Do I concern myself with this? Another example is a folder I have for my xml sitemap generator. I imagine google isn't going to try to index this or count it as content, so do I need to add folders like this to the disallow list?
Technical SEO | | SpringMountain0