Robots.txt Question
-
For our company website faithology.com we are attempting to block out any urls that contain a ? mark to keep google from seeing some pages as duplicates.
Our robots.txt is as follows:
User-Agent: * Disallow: /*? User-agent: rogerbot Disallow: /community/ Is the above correct? We are wanting them to not crawl any url with a "?" inside, however we don't want to harm ourselves in seo. Thanks for your help!
-
You can use wild-cards, in theory, but I haven't tested "?" and that could be a little risky. I'd just make sure it doesn't over-match.
Honestly, though, Robots.txt isn't as reliable as I'd like. It can be good for preventing content from being indexed, but once that content has been crawled, it's not great for removing it from the index. You might be better off with META NOINDEX or using the rel=canonical tag.
It depends a lot on what parameters you're trying to control, what value these pages have, whether they have links, etc. A wholesale block of everything with "?" seems really dangerous to me, IMO.
If you want to give a few example URLs, maybe we could give you more specific advice.
-
if I were you I would want to be 100% sure I got it right. This tool has never let me down and the way you have Roger bot he may be blocked.
Why not use a free tool from a very reputable company to make your robot text perfect
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
http://www.searchenginepromotionhelp.com/m/robots-text-tester/
then lastly to make sure everything is perfect I recommend one of my favorite free tools up to 500 pages is as many times as you want that costs I believe $70 a year
http://www.screamingfrog.co.uk/seo-spider/
his one of the best tools on the planet
while you're at Internet marketing ninjas website look for other tools they have loads of excellent tools that are recommend here.
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
Sincerely,
Thomas
-
Yes you can
Robots.txt Wildcard Matching
Google and Microsoft's Bing allow the use of wildcards in robots.txt files.
To block access to all URLs that include a question mark (?), you could use the following entry:
User-agent: *
Disallow: /*?You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:
User-agent: Googlebot
Disallow: /*.asp$More background on wildcards available from Google and Yahoo! Search.
More
http://tools.seobook.com/robots-txt/
hope I was of help,
Tom
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Asking a natural question in H tags ?
Hello, I read that in H tags it is more natural to write the question a user would ask, does it really have any benefits in terms of seo For example instead of "Tour map" writing "what are the villages you visit ?" or instead of "Activity level" write " "what is the level like ?" Does it help in anyway ? Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Questions Regarding Wordpress Blog Format, Categories and Tag pages...
I'm looking to make some optimizations to a website I'm working on but wanted more input before I get started: Currently, when blogs are posted to the website, the URL for each post looks like this:www.mywebsite.com/blogpost I've heard that for whatever reason, the best practice is to make sure that the blog posts get posted to a blog sub-directory like so: www.mywebsite.com/blog/blogpost If I were to make this change, I'm assuming I would have to 301 redirect all of the existing blogs to their new locations. Is this change worth doing and are there any other considerations I should be taking into account? Also, I'm aware that there are certain schools of thought that category and tag pages should be no-indexed to avoid duplicate content issues. Can anyone shed some light on this from first hand experience? Thanks in advance!
Intermediate & Advanced SEO | | goldbergweismancairo0 -
Another footer question
Hi to all, Maybe this question is already answered (in that case sorry) but I didn't find it. Currently, with the latest changes is really useful to have a 'seo footer'. I mean, it seems that can give you more problems that benefits. In my case the idea of the footer is only to obtain more traffic. Having this in mind, I'm right thinking that is better don't write anything ? Thanks in advance
Intermediate & Advanced SEO | | nsimpson0 -
Domain and Sitemap Question
Hi - I am hoping you can help me with this issue we are currently trying to solve. We are hosting our mobile site's content on a different domain than what the URL of the site is, though owned by same company. In Google Webmasters tool we have the mobile sitemap under "sitemaps.xyz.com", however the URL of the site is "m.xyz.com". We have submitted 60MM pages in the mobile sitemap, but only 1MM pages have been indexed. Do you think this set up causes confusion with the bots? Does this affect the crawlability of the site? Any thoughts would be greatly appreciated. Thank you!
Intermediate & Advanced SEO | | ladylana
Eva0 -
Complicated Question: Removing Spam Backlinks that were Not Requested
I'm new and seeking help with the following scenario: 1. Main site: is a domain.com established authority type site 2. Second site: is a domain.org (has robots.txt to no index) but someone obviously not site owner has done negative seo campaign against the .org domain and built spammy links to it. In fact, that's all that exist on this second domain because it is used for development purposes only right now.) No one would link to this one normally as it is just secondary domain used to protect trademark and for development use.) When searching for it by domain name it does not appear on first page for search results. Checking link profile the only links that show for domain.org are spam links. Have contacted site/s where spam links were placed (no answer) Main site domain.com and domain.org have same whois and hosted on the same server as they are owned by same company Main site domain.com still appears first for its name but has lost some rankings. I am working to fix some technical issues ie: duplicate urls with CMS etc, but would like to find out what to do about the domain.org content that clearly has had someone target it with spammy non requested backlinks.) domain.com has Google webmaster tools account, no messages about unnatural liking in those reports 1. I'm not sure I should add domain.org to GWT to see if there is an unnatural link penalty applied or if this would further connect the two domains through association. If I could get some feedback/suggestions on what my options are with regards to making sure that the domain.org domain has a clean profile that would be most appreciated. Also because site owner has would like to begin using domain.org in the future for some unique content, but as it stands right now cannot because domain has been targed by poor backlinks. Anyone else run into situation where the .org or .net versions were targeted by spammy backlinks even though the domains were not actively used? What's the safest way to proceed? a) Concerned about possible co-penalty between main site domain.com and domain.org b) how to remove problems issues with domain.org so that owner can use it in future. Many thanks for your thoughts and help with this one. I appreciate any help or feedback.
Intermediate & Advanced SEO | | web0230 -
New server update + wrong robots.txt = lost SERP rankings
Over the weekend, we updated our store to a new server. Before the switch, we had a robots.txt file on the new server that disallowed its contents from being indexed (we didn't want duplicate pages from both old and new servers). When we finally made the switch, we somehow forgot to remove that robots.txt file, so the new pages weren't indexed. We quickly put our good robots.txt in place, and we submitted a request for a re-crawl of the site. The problem is that many of our search rankings have changed. We were ranking #2 for some keywords, and now we're not showing up at all. Is there anything we can do? Google Webmaster Tools says that the next crawl could take up to weeks! Any suggestions will be much appreciated.
Intermediate & Advanced SEO | | 9Studios0