Robots.txt - What is the correct syntax?
-
Hello everyone
I have the following link:
http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167
I want to prevent google from indiexing everything that is related to "view=send_friend"
The problem is that its giving me dublicate content, and the content of the links has no SEO value of any sort.
My problem is how i disallow it correctly via robots.txt
I tried this syntax:
Disallow: /view=send_friend/
However after doing a crawl on request the 200+ dublicate links that contains view=send_friend is still present in the CSV crawl report.
What is the correct syntax if i want to prevent google from indexing everything that is related to this kind of link?
-
I added your suggestion to robots.txt and requested a crawl again.
I only have 3 pages with dublicate page content now
So your suggestion seemes to have worked.
Thanks for your reply.. it worked!
-
you are right. misinterpreted the explanation. Apologies
-
Jarno,
The $ would suggest this parameter is always on the end of a URL. And within Henrik's example it's already somewhere in the middle of the URL.
-
Henrik,
i think you should be looking into something like this:
User-agent: Googlebot
Disallow: /*view=send_friend$hope this helps
Kind regards
Jarno
-
Hi Henrik,
I would suggest trying: Disallow: &view=send_friend
Optional you could try this without the & as I'm not sure this is always at the start of this parameter.Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Utilizing one robots.txt for two sites
I have two sites that are facilitated hosting in similar CMS. Maybe than having two separate robots.txt records (one for every space), my web office has made one which records the sitemaps for the two sites, similar to this:
Technical SEO | | eulabrant0 -
Robots.txt | any SEO advantage to having one vs not having one?
Neither of my sites has a robots.txt file. I guess I have never been bothered by any particular bot enough to exclude it. Is there any SEO advantage to having one anyways?
Technical SEO | | GregB1230 -
Google is indexing blocked content in robots.txt
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
Technical SEO | | elisainteractive0 -
Meta-robots Nofollow
I don't understand Meta-robots Nofollow. Wordpress has my homepage set to this according to SEOMoz tool. Is this really bad?
Technical SEO | | hopkinspat1 -
Should I add my blog posts to my sitemap.txt file?
This seems like it should be an obvious no, just because of the amount of work that would entail, and then remembering to do it every time I make a post, but since I couldn't find anything on Google about it and have never heard anyone mention it, I figured I'd ask.
Technical SEO | | UnderRugSwept0 -
No follow syntax
is ref=nofollow the same as rel=nofollow? in other words, does ref=nofollow not pass any link juice?
Technical SEO | | SoulSurfer80 -
How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?
Today's sitemap webinar made me think about the disallow feature, seems opposite of sitemaps, but it also seems both are kind of ignored in varying ways by the engines. I don't need help semantically, I got that part. I just can't seem to find a contemporary answer about what should be blocked using the robots.txt file. For example, I have folders containing site comps for clients that I really don't want showing up in the SERPS. Is it better to not have these folders on the domain at all? There are also security issues I've heard of that make sense, simply look at a site's robots file to see what they are hiding. It makes it easier to hunt for files when they know the directory the files are contained in. Do I concern myself with this? Another example is a folder I have for my xml sitemap generator. I imagine google isn't going to try to index this or count it as content, so do I need to add folders like this to the disallow list?
Technical SEO | | SpringMountain0