Best use of robots.txt for "garbage" links from Joomla!
-
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received.
One of my biggest gripes is the point of "Dublicate Page Content".
Right now im having over 200 pages with dublicate page content.
Now.. This is triggerede because Seomoz have snagged up auto generated links from my site.
My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears.
Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google.
So i just want to get rid of them.
Now to my question
I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all.
But, how do i do this? what should my syntax be?
A lof of the links looks like this, but has different id numbers according to the product that is being send:
http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167
I guess i need a rule that grabs the following and makes google ignore links that contains this:
view=send_friend
-
Hi Henrik,
It can take up to a week for SEOmoz crawlers to process your site, which may be an issue if you recently added the tag. Did you remember to include all user agents in your first line?
User-agent: *
Be sure to test your robots.txt file in Google Webmaster Tools to ensure everything is correct.
Couple of other things you can do:
1. Add a rel="nofollow" on your send to friend links.
2. Add a meta robots "noindex" to the head of the popup html.
3. And/or add a canonical tag to the popup. Since I don't have a working example, I don't know what to canonical it too (whatever content it is duplicating) but this is also an option.
-
I just tried to add
Disallow: /view=send_friend
I removed the last /
however a crawl gave me the dublicate content problem again.
Is my syntax wrong?
-
The second one "Disallow: /*view=send_friend" will prevent googlebot from crawling any url with that string in it. So that should take care of your problem.
-
So my link example would look like this in robots.txt?
Disallow: /index.php?option=com_redshop&view=send_friend&pid=&tmpl=component&Itemid=
Or
Disallow: /view=send_friend/
-
Your right I would disallow via robots.txt & a wildcard (*) wherever a unique item id # could be generated.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will a Robots.txt 'disallow' of a directory, keep Google from seeing 301 redirects for pages/files within the directory?
Hi- I have a client that had thousands of dynamic php pages indexed by Google that shouldn't have been. He has since blocked these php pages via robots.txt disallow. Unfortunately, many of those php pages were linked to by high quality sites mulitiple times (instead of the static urls) before he put up the php 'disallow'. If we create 301 redirects for some of these php URLs that area still showing high value backlinks and send them to the correct static URLs, will Google even see these 301 redirects and pass link value to the proper static URLs? Or will the robots.txt keep Google away and we lose all these high quality backlinks? I guess the same question applies if we use the canonical tag instead of the 301. Will the robots.txt keep Google from seeing the canonical tags on the php pages? Thanks very much, V
Technical SEO | | Voodak0 -
Robots.txt on refinements
In dealing with Panda do you think it is a good idea to put all refinements for category pages in the robots.txt file? We already have a lot as noindex, follow but I am wondering if it would be better to address from a crawl perspective as the pages are probably thin duplicate content to Google.
Technical SEO | | Gordian0 -
Links below linking (not sitelinks)
Hi All, Please can you let me know the name and / or point me at an article / blog / directory on how best to achieve additional links under a search engine listing (I don't mean site links) e.g. I do a search for 'home insurance' on Google.co.uk and under the listing for Compare the Market it has - home insurance, building insurance and landlords insurance. Thanks for your help!
Technical SEO | | Joseph-Vodafone0 -
Site Navigation leads to "Too Many On-Page Links" warning
I run an ecommerce site with close to 2000 products. Nearly every page in the catalog has a too many on-page links error because of the navigation sidebar, which has several flyout layers of nested links. What can/should I do about this? Will it affect my rankings at all? Thanks
Technical SEO | | AmericanOutlets0 -
Robots.txt file getting a 500 error - is this a problem?
Hello all! While doing some routine health checks on a few of our client sites, I spotted that a new client of ours - who's website was not designed built by us - is returning a 500 internal server error when I try to look at the robots.txt file. As we don't host / maintain their site, I would have to go through their head office to get this changed, which isn't a problem but I just wanted to check whether this error will actually be having a negative effect on their site / whether there's a benefit to getting this changed? Thanks in advance!
Technical SEO | | themegroup0 -
Internal Linking: Site-wide VS Content Links
I just watched this video in which Matt Cutts talks about the ancient 100 links per page limit. I often encounter websites which have massive navigation (elaborate main menu, side bar, footer, superfooter...etc) in addition to content area based links. My question is do you think Google passes votes (PageRank and anchor text) differently from template links such as navigation to the ones in the content area, if so have you done any testing to confirm?
Technical SEO | | Dan-Petrovic0 -
What is the best top menu linking structure (for SEO) for my site: A or B?
I don't know if these two scenarios are any different as far as SEO is concerned, but I wanted to ask to get an opinion. On my website: http://www.rainchainsdirect.com you can see there is a top menu with "About" "Info" "Questions" etc. Some of these links lead to further pages that are essentially a indeces for multiple further links. My question is: in terms of SEO, is it better to A) have all links (that are now on the pages that the menu links lead to) displayed in a drop down menu directly from the top menu (and bypassing an intermediate page) or B) to have it as it is now where you have to click to an intermediate page (like "rain chain info") to get access to the links (and not have such a large drop down menu) Is there a difference in terms of SEO? In terms of useability it almost seems like a toss up between the two, so if there were better SEO value to one of the other, then I would choose that one. By the way, I know that the way it is structured now is strange, where there is only one drop down that leads to the same page as the top menu item, but that will be fixed, fyi. Thanks!
Technical SEO | | csblev0