Best use of robots.txt for "garbage" links from Joomla!
-
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received.
One of my biggest gripes is the point of "Dublicate Page Content".
Right now im having over 200 pages with dublicate page content.
Now.. This is triggerede because Seomoz have snagged up auto generated links from my site.
My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears.
Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google.
So i just want to get rid of them.
Now to my question
I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all.
But, how do i do this? what should my syntax be?
A lof of the links looks like this, but has different id numbers according to the product that is being send:
http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167
I guess i need a rule that grabs the following and makes google ignore links that contains this:
view=send_friend
-
Hi Henrik,
It can take up to a week for SEOmoz crawlers to process your site, which may be an issue if you recently added the tag. Did you remember to include all user agents in your first line?
User-agent: *
Be sure to test your robots.txt file in Google Webmaster Tools to ensure everything is correct.
Couple of other things you can do:
1. Add a rel="nofollow" on your send to friend links.
2. Add a meta robots "noindex" to the head of the popup html.
3. And/or add a canonical tag to the popup. Since I don't have a working example, I don't know what to canonical it too (whatever content it is duplicating) but this is also an option.
-
I just tried to add
Disallow: /view=send_friend
I removed the last /
however a crawl gave me the dublicate content problem again.
Is my syntax wrong?
-
The second one "Disallow: /*view=send_friend" will prevent googlebot from crawling any url with that string in it. So that should take care of your problem.
-
So my link example would look like this in robots.txt?
Disallow: /index.php?option=com_redshop&view=send_friend&pid=&tmpl=component&Itemid=
Or
Disallow: /view=send_friend/
-
Your right I would disallow via robots.txt & a wildcard (*) wherever a unique item id # could be generated.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best way to create robots.txt for my website
How I can create robots.txt file for my website guitarcontrol.com ? It is having login and Guitar lessons.
Technical SEO | | zoe.wilson170 -
Robots.txt on subdomains
Hi guys! I keep reading conflicting information on this and it's left me a little unsure. Am I right in thinking that a website with a subdomain of shop.sitetitle.com will share the same robots.txt file as the root domain?
Technical SEO | | Whittie0 -
Internal links best practices
In looking at the inbound links to a client’s Home page, I see that the link from each page of the website back to the Home page is an image, and the ALT text is “Home.” I have a few questions about this, and would appreciate help understanding best practices: --Does it matter that the link back to the Home page is an image (presumably the client’s logo)? -- If we keep the image link, wouldn’t it be better to use “client’s company name” as ALT text rather than “Home”? --Should I recommend using an HTML link back to the Home page, and using the company name as anchor text? (I don't think it's relevant, but the site is built in Drupal.) Thanks!
Technical SEO | | jrae0 -
"Site:" without Homepage, Why?
Hi all, When I put "site:bettingexchange.it" on www.google.it in the SERP it's NOT showed the HOMEPAGE "bettingexchange.it". Google starts with other pages lik "bettingexchange.it/siti/". It's the first time I see something like this, How is it possibile?
Technical SEO | | bettingexchange
How can I reintroduce the homepage?0 -
What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT. I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap. Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
Technical SEO | | DotCar0 -
How to add "no follow" to feeds
Hey all, I just had a crawl test done on my site(created using wordpress) and I received a ton of missing meta tag descriptions to fix. The odd thing is though I use "All in One" SEO Tool and the actual pages or posts on the site do have meta tag descriptions, however I noticed for every post an RSS Feed is being automatically generated and this Feed is the culprit without meta tag descriptions. I am totally clueless on how to resolve these errors as I havent installed any WP plugins that generate feeds automatically. Has anyone encountered this problem before or know how to fix this?? The site url is http:// GovernmentGrantsAustralia . org I have left spaces above to avoid being a link dropper 🙂 Would really appreciate if anyone can help! Thanks a million, Jus
Technical SEO | | justin990 -
InSite Linking Best Practices
When creating links within your website, is it bad to have a anchor text link pointing back to the same page? Say the page the homepage is optimized for "credit cards". If I have a "credit cards" anchor text link on the page the link points to, is that bad practice? Secondly, if it's better to put that link on a different page, wouldn't I be placing a keyword that's optimized for a different page on the wrong page? (hopefully I'm making sense) Any guidance would be greatly appreciated!
Technical SEO | | MichaelWeisbaum0