Best use of robots.txt for "garbage" links from Joomla!
-
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received.
One of my biggest gripes is the point of "Dublicate Page Content".
Right now im having over 200 pages with dublicate page content.
Now.. This is triggerede because Seomoz have snagged up auto generated links from my site.
My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears.
Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google.
So i just want to get rid of them.
Now to my question
I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all.
But, how do i do this? what should my syntax be?
A lof of the links looks like this, but has different id numbers according to the product that is being send:
http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167
I guess i need a rule that grabs the following and makes google ignore links that contains this:
view=send_friend
-
Hi Henrik,
It can take up to a week for SEOmoz crawlers to process your site, which may be an issue if you recently added the tag. Did you remember to include all user agents in your first line?
User-agent: *
Be sure to test your robots.txt file in Google Webmaster Tools to ensure everything is correct.
Couple of other things you can do:
1. Add a rel="nofollow" on your send to friend links.
2. Add a meta robots "noindex" to the head of the popup html.
3. And/or add a canonical tag to the popup. Since I don't have a working example, I don't know what to canonical it too (whatever content it is duplicating) but this is also an option.
-
I just tried to add
Disallow: /view=send_friend
I removed the last /
however a crawl gave me the dublicate content problem again.
Is my syntax wrong?
-
The second one "Disallow: /*view=send_friend" will prevent googlebot from crawling any url with that string in it. So that should take care of your problem.
-
So my link example would look like this in robots.txt?
Disallow: /index.php?option=com_redshop&view=send_friend&pid=&tmpl=component&Itemid=
Or
Disallow: /view=send_friend/
-
Your right I would disallow via robots.txt & a wildcard (*) wherever a unique item id # could be generated.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Role of Robots.txt and Search Console parameters settings
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
Technical SEO | | LivDetrick0 -
Robots.txt on subdomains
Hi guys! I keep reading conflicting information on this and it's left me a little unsure. Am I right in thinking that a website with a subdomain of shop.sitetitle.com will share the same robots.txt file as the root domain?
Technical SEO | | Whittie0 -
Objects behind "hidden" elements
If you take a look at this page: http://www.americanmuscle.com/2010-mustang-body-kits.html You will notice we have a little "Read More" script set up. I have used Google Data Validator to test structured data located behind this 'Read More' and it checks out OK but I was wondering if anyone has insight to whether or not the spiders are even seeing links, etc. behind the 'Read More' script.
Technical SEO | | andrewv0 -
What is "evttag=" used for?
I see evttag= used on realtor.com, what looks to be for click tracking purposes. Does anyone know if this is an official standard or something they made up?
Technical SEO | | JDatSB0 -
Do i have my robots.txt file set up properly
Hi, just doing some seo on my site and i am not sure if i have my robots file set correctly. i use joomla and my website is www.in2town.co.uk. here is my robots file, does this look correct to you User-agent: *
Technical SEO | | ClaireH-184886
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ many thanks1 -
Robots.txt and joomla
Hello, I use joomla for my website and automatically all those files are blocked is that good or bad, so I remove anything and if so why ? User-agent: *
Technical SEO | | seoanalytics
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ I also added to my robots.txt files my email address ( is that useful, I am afraid google passes PR to the email address )
and a javascript: void (0) because I have tabs on my webpage ( is that useful )
as well as a .pdf ( is it also useful ) any comments ? does anything need to be changed or is it ok ? Thank you,0 -
Rel="no follow" for All Links on a Site that Charges for Advertising
If I run a site that charges other companies for listing their products, running banner advertisements, white paper downloads, etc. does it make sense to "no follow" all of their links on my site? For example: they receive a profile page, product pages and are allowed to post press releases. Should all of their links on these pages be "no follow"? It seems like a gray area to me because the explicit advertisements will definitely be "no followed" and they are not buying links, but buying exposure. However, I still don't know the common practice for links from other parts of their "package". Thanks
Technical SEO | | zazo0 -
Does using tags instead of " " good for SEO purposes?
I'm currently using <pr>tags for paragraphs and came across an article that said it is better for search engines to see the</pr> tag than
Technical SEO | | ibex
tag to separate paragraphs.0