Robots.txt - What is the correct syntax?
-
Hello everyone
I have the following link:
http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167
I want to prevent google from indiexing everything that is related to "view=send_friend"
The problem is that its giving me dublicate content, and the content of the links has no SEO value of any sort.
My problem is how i disallow it correctly via robots.txt
I tried this syntax:
Disallow: /view=send_friend/
However after doing a crawl on request the 200+ dublicate links that contains view=send_friend is still present in the CSV crawl report.
What is the correct syntax if i want to prevent google from indexing everything that is related to this kind of link?
-
I added your suggestion to robots.txt and requested a crawl again.
I only have 3 pages with dublicate page content now
So your suggestion seemes to have worked.
Thanks for your reply.. it worked!
-
you are right. misinterpreted the explanation. Apologies
-
Jarno,
The $ would suggest this parameter is always on the end of a URL. And within Henrik's example it's already somewhere in the middle of the URL.
-
Henrik,
i think you should be looking into something like this:
User-agent: Googlebot
Disallow: /*view=send_friend$hope this helps
Kind regards
Jarno
-
Hi Henrik,
I would suggest trying: Disallow: &view=send_friend
Optional you could try this without the & as I'm not sure this is always at the start of this parameter.Hope this helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
XHTML tag syntax for rel=alternate hreflang
Is there a difference in the below two tags? My dev team is saying the first can be implemented (technical issue on their end), even though second is preferable, according to support.google.com, in the below two sitemap hreflang notations. My question is, will the first xhtml tag work for Google? Appreciate the input. <xhtml:link href="<a href="http://store.hp.com/CanadaStore/" rel="nofollow" target="_blank">http://store.hp.com/CanadaStore/" hreflang="en-ca" rel="alternate" /></xhtml:link href="<a> <xhtml:link href="<a href=" http:="" store.hp.com="" canadastore="" "="" rel="nofollow" target="_blank">http://store.hp.com/CanadaStore/" rel="alternate" hreflang="en-ca" /></xhtml:link >
Technical SEO | | ZachKline0 -
Clarification regarding robots.txt protocol
Hi,
Technical SEO | | nlogix
I have a website , and having 1000 above url and all the url already got indexed in Google . Now am going to stop all the available services in my website and removed all the landing pages from website. Now only home page available . So i need to remove all the indexed urls from Google . I have already used robots txt protocol for removing url. i guess it is not a good method for adding bulk amount of urls (nearly 1000) in robots.txt . So just wanted to know is there any other method for removing indexed urls.
Please advice.0 -
Log in, sign up, user registration and robots
Hi all, We have an accommodation site that asks users only to register when they want to book a room, in the last step. Though this is the ideal situation when you have tons of users, nowadays we are having around 1500 - 2000 per day and making tests we found out that if we ask for a registration (simple, 1 click FB) we mail them all and through a good customer service we are increasing our sales. That is why, we would like to ask users to register right after the home page ie Home/accommodation or and all the rest. I am not sure how can I make to make that content still visible to robots.
Technical SEO | | Eurasmus.com
Will the authentication process block google crawling it? Maybe something we can do? We are not completely sure how to proceed so any tip would be appreciated. Thank you all for answering.3 -
Why Google ranks a page with Meta Robots: NO INDEX, NO FOLLOW?
Hi guys, I was playing with the new OSE when I found out a weird thing: if you Google "performing arts school london" you will see w w w . mountview . org. uk at the 3rd position. The point is that page has "Meta Robots: NO INDEX, NO FOLLOW", why Google indexed it? Here you can see the robots.txt allows Google to index the URL but not the content, in article they also say the meta robots tag will properly avoid Google from indexing the URL either. Apparently, in my case that page is the only one has the tag "NO INDEX, NO FOLLOW", but it's the home page. so I said to myself: OK, perhaps they have just changed that tag therefore Google needs time to re-crawl that page and de-index following the no index tag. How long do you think it will take to don't see that page indexed? Do you think it will effect the whole website, as I suppose if you have that tag on your home page (the root domain) you will lose a lot of links' juice - it's totally unnatural a backlinks profile without links to a root domain? Cheers, Pierpaolo
Technical SEO | | madcow780 -
NGINX 301 configuration - it is correct?
I'm totally not an expert in Technical Seo... but i am worry that my server admin neither is. Below is his vhost configuration, anyone can check this? it's this correct and SEO friendly? server { listen *:80; server_name domainaddress.pl domainaddress.com.pl; root /home/www/domainaddress.pl/web; index index.html index.htm key-words.php index.php index.cgi index.pl index.xhtml; location /key {
Technical SEO | | Nemo85
rewrite ^/key-words/$ http://domainaddress.pl/ permanent;
rewrite ^/key-words.php$ http://domainaddress.pl/ break;
} location / {
if ($http_host ~ "^www.domainaddress.pl"){
rewrite ^(.*)$ http://domainaddress.pl/$1 permanent;
} rewrite ^/key-words.php$ http://domainaddress.pl/ permanent;
} }0 -
Robots.txt and joomla
Hello, I use joomla for my website and automatically all those files are blocked is that good or bad, so I remove anything and if so why ? User-agent: *
Technical SEO | | seoanalytics
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ I also added to my robots.txt files my email address ( is that useful, I am afraid google passes PR to the email address )
and a javascript: void (0) because I have tabs on my webpage ( is that useful )
as well as a .pdf ( is it also useful ) any comments ? does anything need to be changed or is it ok ? Thank you,0 -
Wordpress Robots.txt Sitemap submission?
Alright, my question comes directly from this article by SEOmoz http://www.seomoz.org/learn-seo/r... Yes, I have submitted the sitemap to google, bing's webmaster tools and and I want to add the location of our site's sitemaps and does it mean that I erase everything in the robots.txt right now and replace it with? <code>User-agent: * Disallow: Sitemap: http://www.example.com/none-standard-location/sitemap.xml</code> <code>???</code> because Wordpress comes with some default disallows like wp-admin, trackback, plugins. I have also read this, but was wondering if this is the correct way to add sitemap on Wordpress Robots.txt. [http://www.seomoz.org/q/removing-...](http://www.seomoz.org/q/removing-robots-txt-on-wordpress-site-problem) I am using Multisite with Yoast plugin so I have more than one sitemap.xml to submit Do I erase everything in Robots.txt and replace it with how SEOmoz recommended? hmm that sounds not right. like <code> <code>
Technical SEO | | joony2008
<code>User-agent: *
Disallow: </code> Sitemap: http://www.example.com/sitemap_index.xml</code> <code>``` Sitemap: http://www.example.com/sub/sitemap_index.xml ```</code> <code>?????????</code> ```</code>0 -
Do you get credit for an external link that points to a page that's being blocked by robots.txt
Hi folks, No one, including me seems to actually know what happens!? To repeat: If site A links to /home.html on site B and site B blocks /home.html in Robots.txt, does site B get credit for that link? Does the link pass PageRank? Will Google still crawl through it? Does the domain get some juice, but not the page? I know there's other ways of doing this properly, but it is interesting no?
Technical SEO | | DaveSottimano0