Robot.txt help
-
Hi,
We have a blog that is killing our SEO.
We need to
Disallow
Disallow: /Blog/?tag*
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspxBut Allow everything below /Blog/Post
The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt
Thanks
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
Thanks it goes straight to www.mysite.com/Blog
-
Yup, I understand that you want to see your main site. This is why I recommended blocking only /Blog and not / (your root domain).
However, many blogs have a landing page. Does yours? In other words, when you click on your blog link, does it take you straight to Blog/posts or is there another page in between, eg /Blog/welcome?
If it does not go straight into Blog/posts you would want to also allow the landing page.
Does that make sense?
-
The structure is:
www.mysite.com - want to see everything at this level and below it
www.mysite.com/Blog - want to BLOCK everything at this level
www.mysite.com/Blog/posts - want to see everything at this level and below it
-
Well what Martijn (sorry, I spelled his name wrong before) and I were saying was not to forget to allow the landing page of your blog - otherwise this will not be indexed as you are disallowing the main blog directory.
Do you have a specific landing page for your blog or does it go straight into the /posts directory?
I'd say there's nothing wrong with allowing both Blog/Post and Blog/post just to be on the safe side...honestly not sure about case sensitivity in this instance.
-
"We're getting closer David, but after reading the question again I think we both miss an essential point ;-)" What was the essential point you missed. sorry I don't understand. I don;t want to make a mistake in my Robot.txt so would like to be 100% sure on what you are saying
-
Thanks guys so I have
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/Postthat works. My Home page also works. I there anything wrong with including both uppercase "Post" and lowercase "post". It is lowercase on the site but want uppercase "P" just incase. Is there a way to make the entry non case sensitive?
Thanks
-
Correct, Martijin. Good catch!
-
There was a reason that I said he should test this!
We're getting closer David, but after reading the question again I think we both miss an essential point ;-). As we know also exclude the robots from crawling the 'homepage' of the blog. If you have this homepage don't forget to also Allow it.
-
Well, no point in a blog that hurts your seo
I respectfully disagree with Martijin; I believe what you would want to do is disallow the Blog directory itself, not the whole site. It would seem if you Disallow: / and _Allow:/Blog/Post _ that you are telling SEs not to index anything on your site except for /Blog/Post.
I'd recommend:
User-agent: *
Disallow: /Blog/
Allow: /Blog/PostThis should block off the entire Blog directory except for your post subdirectory. As Maritijin stated; always test before you make real changes to your robots.txt.
-
That would be something like this, please check this or test this within Google Webmaster Tools if it works because I don't want to screw up your whole site. What this does is disallowing your complete site and just allows the /Blog/Post urls.
User-agent: *
Disallow: /
Allow: /Blog/Post
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sudden Rankings Drop for Good Keywords.. Did I Do This? Please Help :(
Hello, I noticed a gradual rankings drop for 3 important keywords over the last month, with a pretty big plummet the last two weeks. Overall in the last month+ we dropped from position 9 to 41.I noticed this when I dug further after noticing traffic dropping since February (not a drastic traffic drop). I should note that the keywords took people to my client's homepage. Their branded keywords have no suffered and I looked at a couple others that haven't either. Now, there is a link in the site footer (we have site wide header and footer) that takes you to a static page that contains links for the 2 digital flipbook catalogs the customer has (one for US and one for Canada). My concern is that at the end of January I had a developer implement a noindex/nofollow meta robot & robots.txt disallow specifically on the HTML pages/URL of the Canadian catalog ONLY. It specifically pointed to that flipbook URL. This catalog is nearly identical to the US catalog and I thought I'd be eliminating duplicate content and helping with crawl budget. After looking further into it last week (reading up about internal nofollows not necessarily being detrimental, but not recommended) and noticing the drop in search visibility traffic (starting gradually in March), I had the disallow/nofollow removed. This was last week, and over this last week the traffic took an even bigger drop (not amazingly drastic but enough to be concerned) and I noticed the keywords that we did ok for dropped even more this last week (down to 41). I'm concerned this has to do with the change I made at the end of January and reversed back. I should note that I don't think these catalogs or the static page that links to them brought any traffic. The keywords I am concerned about fell on our homepage (where the link to the static page that contains the links to both catalogs is in the sitewide footer) The catalogs are a couple hundred pages. I honestly don't see how this could do it, unless it has something to do with the footer being sitewide? There have been site upgrades/dev changes over the last couple months too (although I am not sure if that affected other clients who received the same upgrade), so this is hard to pinpoint. Sorry this is so long but I'd appreciate someone offering some insight to help ease my mind a bit!
Intermediate & Advanced SEO | | AliMac260 -
Help with force redirect HTTP to HTTPS
Hi, I'm unsure of where I should be putting the following code for one of my Wordpress websites so that they redirect all HTTP requests to HTTPS. RewriteEngine On RewriteCond %{HTTPS} !=on RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301] This is my current htaccess file: *missing
Intermediate & Advanced SEO | | Easigrass0 -
I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!
Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?
Intermediate & Advanced SEO | | McTaggart0 -
DMOZ help
So yesterday I got a DMOZ editor account. I would like to know if Google indexes the editor profile pages on DMOZ: http://www.dmoz.org/public/profile?editor= here are some examples http://www.dmoz.org/public/profile?editor=thehelper http://www.dmoz.org/public/profile?editor=raph3988 http://www.dmoz.org/public/profile?editor=skasselea I would like to know if it is worth while to build up this page so it will pass link juice. And can anyone tell me how frequently Google crawls for new editors (if that's possible?)
Intermediate & Advanced SEO | | raph39880 -
SEO Features of Different Shopping Cart, Need Help on Selecting BEST Cart
I want to move my existing online store to a new Multilanguage Shopping Cart. After reading Andrew Bleakley's review, I have to make a dissisions between the following shopping cart. New to SEO, I will like to have your impressions on there SEO Features and HELP me select the BEST SEO frendly shopping cart. corecommerce: Search Engine Friendly
Intermediate & Advanced SEO | | BigBlaze205
• Entire store is search engine optimized
• Dynamically generates static html pages with clean URL's (no numbers and special characters inside the URL's) for products, categories and custom content pages (ie about us, etc.)
• Auto Generated Meta Tags for your store to create web site page rankings that matter!
• Built in Canonical URL's
• Header tags usage for your product names
• Auto-build of the Google XML Sitemap
• Meta tags for all products, categories and content pages
• Add your own custom meta tags from your store
• You determine the page titles
• You can add custom content for the category pages
• Use of alt and title tags for images
• Use of bread crumb trails
• Customer SEO page slugs (naming) for products and categories americommerce: Have a look at there page, the list is to long. http://www.americommerce.com/ecommerce-seo.html ------------------------------ asho Web Optimisation - SEO
Fully Search Engine Optimised (SEO) structure
Optimised and updated by industry leading SEO experts
Source code optimised for search engines
Page based META management. Titles, descriptions and keywords
Link popularity program between Ashop Commerce customers
Always up to date with new search engine algorithms
Spider friendly URLs
301 Redirect and 302 Redirect support for any page
Free SEO tips & tools
Auto HTML and CSS page generation
Google, Yahoo and Bing verification meta inclusion
Inbuilt automatic XML site map generation (Google, Yahoo, Bing)
Correct use of robots.txt file
Correct use of NOFOLLOW attribute
Auto set up with correct hierarchy for H tags
Auto creation of alt text on menu text links and images
Tableless design structure Pinnaclecart Cutting-edge search engine optimization
Flat URL generation (www.yoursite.com/product/product-name.htm)
Ability to create your own URL structure (www.yoursite.com/create-your-own-url.html)
Manage all URL's at the category, product and page level.
If you're moving from a different application, you can change the URL's to match the old URL's.
Meta keyword, title and description for each product page
Meta keyword, title and description for each category page
Meta keyword and description for home page and all pages created via the control panel
Upload / create robot.txt file
Ability to generate meta tags based on product descriptions
Ability to modify HTML or CSS via browser
Automatically generated site map
Generates XML site map for Google Base
Valid WC3 template design
Image ALT tags
Design generates very lightweight, 100% HTML page without CSS or PHP coding
Creates "breadcrumb" navigation
Search-engine-friendly architecture
Ability to upload customized HTML pages to assist in How would you rate those carts structure in a SEO point of view? My existing store, http://www.filtrationmontreal.com/ (OsCommerce) need SEO improvement and moving it to a SEO Frendly Shopping Cart is the best thing I can do to increase sales and conversions. Thank you for your support, help and precious time. BigBlaze0 -
Do Banner Ads Help In Link Building?
We have been contacting some webmasters for building links, but a lot of them will only do banner ads on there site. Is having a keyword rich alt tag on the banner ad and a do follow link to our site just as good? Would like to hear your thoughts and l experiences in trying to leverage these banner ads to help in seo ranking. Thank you in advance for your input!
Intermediate & Advanced SEO | | anchorwave0 -
Robots.txt unblock
I'm currently having trouble with what appears to be a cached version of robots.txt. I'm being told via errors in my Google sitemap account that I'm denying Googlebot access to the entire site. I uploaded clean and "Allow" robots.txt yesterday, but receive the same error. I've tried "Fetch as Googlebot" on the index and other pages, but still the error. Here is the latest: | Denied by robots.txt |
Intermediate & Advanced SEO | | Elchanan
| 11/9/11 10:56 AM | As I said, there in not blocking on the robots.txt for 24 hours. HELP!0 -
Reciprocal Links and nofollow/noindex/robots.txt
Hypothetical Situations: You get a guest post on another blog and it offers a great link back to your website. You want to tell your readers about it, but linking the post will turn that link into a reciprocal link instead of a one way link, which presumably has more value. Should you nofollow your link to the guest post? My intuition here, and the answer that I expect, is that if it's good for users, the link belongs there, and as such there is no trouble with linking to the post. Is this the right way to think about it? Would grey hats agree? You're working for a small local business and you want to explore some reciprocal link opportunities with other companies in your niche using a "links" page you created on your domain. You decide to get sneaky and either noindex your links page, block the links page with robots.txt, or nofollow the links on the page. What is the best practice? My intuition here, and the answer that I expect, is that this would be a sneaky practice, and could lead to bad blood with the people you're exchanging links with. Would these tactics even be effective in turning a reciprocal link into a one-way link if you could overlook the potential immorality of the practice? Would grey hats agree?
Intermediate & Advanced SEO | | AnthonyMangia0