Robot.txt help
-
Hi,
We have a blog that is killing our SEO.
We need to
Disallow
Disallow: /Blog/?tag*
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspxBut Allow everything below /Blog/Post
The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt
Thanks
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
Thanks it goes straight to www.mysite.com/Blog
-
Yup, I understand that you want to see your main site. This is why I recommended blocking only /Blog and not / (your root domain).
However, many blogs have a landing page. Does yours? In other words, when you click on your blog link, does it take you straight to Blog/posts or is there another page in between, eg /Blog/welcome?
If it does not go straight into Blog/posts you would want to also allow the landing page.
Does that make sense?
-
The structure is:
www.mysite.com - want to see everything at this level and below it
www.mysite.com/Blog - want to BLOCK everything at this level
www.mysite.com/Blog/posts - want to see everything at this level and below it
-
Well what Martijn (sorry, I spelled his name wrong before) and I were saying was not to forget to allow the landing page of your blog - otherwise this will not be indexed as you are disallowing the main blog directory.
Do you have a specific landing page for your blog or does it go straight into the /posts directory?
I'd say there's nothing wrong with allowing both Blog/Post and Blog/post just to be on the safe side...honestly not sure about case sensitivity in this instance.
-
"We're getting closer David, but after reading the question again I think we both miss an essential point ;-)" What was the essential point you missed. sorry I don't understand. I don;t want to make a mistake in my Robot.txt so would like to be 100% sure on what you are saying
-
Thanks guys so I have
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/Postthat works. My Home page also works. I there anything wrong with including both uppercase "Post" and lowercase "post". It is lowercase on the site but want uppercase "P" just incase. Is there a way to make the entry non case sensitive?
Thanks
-
Correct, Martijin. Good catch!
-
There was a reason that I said he should test this!
We're getting closer David, but after reading the question again I think we both miss an essential point ;-). As we know also exclude the robots from crawling the 'homepage' of the blog. If you have this homepage don't forget to also Allow it.
-
Well, no point in a blog that hurts your seo
I respectfully disagree with Martijin; I believe what you would want to do is disallow the Blog directory itself, not the whole site. It would seem if you Disallow: / and _Allow:/Blog/Post _ that you are telling SEs not to index anything on your site except for /Blog/Post.
I'd recommend:
User-agent: *
Disallow: /Blog/
Allow: /Blog/PostThis should block off the entire Blog directory except for your post subdirectory. As Maritijin stated; always test before you make real changes to your robots.txt.
-
That would be something like this, please check this or test this within Google Webmaster Tools if it works because I don't want to screw up your whole site. What this does is disallowing your complete site and just allows the /Blog/Post urls.
User-agent: *
Disallow: /
Allow: /Blog/Post
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Help! Rankings dropping after optimising
I have been using MOZ a lot, in the past I have always been able to optimise a website enough to rank well in my local areas, but lately every time I optimise a website based on MOZ recommendations the rankings are just dropping and dropping... I haven't focused any efforts on backlinks, but a few sites have gone from the first page of SERP to 2nd and continue to drop... I have 2 example sites: https://www.documentmanagementsoftware.com.au/ - optimised for "Document Management Software" - was initially ranked no.7 in google AU, now it is no.16 after my efforts. http://www.tmphysio.com.au/ - optimised for "canberra physiotherapy" - was no.6 now it is dropping to no.9 after 'optimisation' Any help or insights would be extremely helpful as I feel hopeless!
Intermediate & Advanced SEO | | thinkLukeSEO0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Is dynamic pages helps in E commerce SEO?
Whats are the best way to create dynamic pages in eCommerce website having static urls? Or what are other ways to increase/create more pages in websites.
Intermediate & Advanced SEO | | Obbserv0 -
Need some help/input about my Joomla sitemap created by XMap
Here is my current sitemap for my site http://www.yakangler.com/index.php?option=com_xmap&view=xml&tmpl=component&id=1 I have some questions about it's current settings. I have a component called JReviews that xmap produces a separate link for each category. ex: http://www.yakangler.com/fishing-kayak-review/265-2013-hobie-mirage-adventure-island 2014-09-03T20:46:25Z monthly 0.4 http://www.yakangler.com/fishing-kayak-review/266-2012-wilderness-systems-tarpon-140 2014-06-03T15:49:00Z monthly 0.4
Intermediate & Advanced SEO | | mr_w
http://www.yakangler.com/fishing-kayak-review/343-wilderness-systems-tarpon-120-ultralite 2013-11-25T06:39:05Z monthly 0.4 Where as my other articles are only linked by the content category. ex: http://www.yakangler.com/news monthly 0.4
http://www.yakangler.com/tournaments monthly 0.4
http://www.yakangler.com/kayak-events monthly 0.4
http://www.yakangler.com/spotlight monthly 0.4 Which option is better?0 -
Renaming your domain from an existing live domain and SEO implications - Please Help *shudder*
Please see the details below. Site A: http://south-african-holiday.mobi is an existing site that is our best site. It is Joomla 3.1 and runs all our ecommerce. Site B: http//www.southerncircle.com/ is our original and has the best DA but is out of date and pretty clunky. joomla 1.5 and all bookings (tour site) are redirected to Site A for processing. Instead of redesigning the Site A I'd like to change the domain name of http://south-african-holiday.mobi -> http://southerncircle.com So far my reading and research (Thanks MOZ for awesome forum!) has provided me with: 1. Do the SEO groundwork. i.e. remove dead links from both sites. Delete useless content and generally tidy up both sites. 2. Map all pages from site a: http://southerncircle.com -> http://south-africa-holiday/ so that the existing pages that have good ranking will have a home on the new site. 3. When ready do a small sample 301 redirect from: http://southerncircle.com to http://south-africa-holiday.mobi. 4. arghhhh now I'm stuck ..... If I redirect to this site then I lose my http://southerncircle.com domain which is what I want to keep....I just want the .mobi site to move to the southerncircle.com site.... I don't consider myself totally thick but this is really confuseing the *$%# out of me PLEASE could you give me some insight here. I'm sure it has been done before without completely losing the sites seo ranking and sending my site into SEO oblivion. If there are any JOOMLA gurus that have done this I'd love to hear from you as well. Many thanks in advance.
Intermediate & Advanced SEO | | SoutherlySwell0 -
Help needed for a 53 Page Internal Website Structure & Internal Linking
Hey all... I'm designing the structure for a website that has 53 pages. Can you take a look at the attached diagram and see if the website structure is ok? On the attached diagram I have numbered the pages from 1 to 53, with 1 being the most important home page - 2,3,4,5, being the next 4 important pages - 6,7,8... 15,16,17 being the 3rd set of important pages, and 18,19,20..... 51,52,53 being the last set of pages which are the easiest to rank. I have two questions: Is the website structure for this correct? I have made sure that all pages on the website are reachable. Considering the home page, and page number 2,3,4,5 are the most important pages - I am linking out to these pages from the the last set of pages (18,29,20...51,52,53). There are 36 pages in the last set - and out of this 36, from 24 of them I am linking back to home page and page number 2,3,4,5. The remaining 8 pages of the 36 will link back to pages 6,7,8...15,16,17. In total the most importnat page will have the following number of internal incoming links: Home Page : 25 Pages 2,3,4,5 : 25 Pages 6,7,8...15,16,17 : 4 Pages 18,19,20...51,52,53 : 1 Is this ok considering home page, and pages 2,3,4,5 are the most important? Or do you think I should divide and give more internal links to the other pages also? If you can share any inputs or suggestions to how I can improve this it will greatly help me. Also if you know any references for good guides to internal linking of websites greater that 50 pages please share them in the answers. Thank you all! Regards, P.S - The URL for the image is at http://imgur.com/XqaK4
Intermediate & Advanced SEO | | arjun.rajkumar810 -
Files blocked in robot.txt and seo
I use joomla and I have blocked the following in my robots.txt is there anything that is bad for seo ? User-agent: * Disallow: /administrator/ Disallow: /cache/ Disallow: /components/ Disallow: /images/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /libraries/ Disallow: /media/ Disallow: /modules/ Disallow: /plugins/ Disallow: /templates/ Disallow: /tmp/ Disallow: /xmlrpc/ Disallow: /mailto:myemail@myemail.com/ Disallow: /javascript:void(0) Disallow: /.pdf
Intermediate & Advanced SEO | | seoanalytics0 -
Help! My Domain Authority keeps dropping! What do I do?
Hey! I just noticed my Domain Authority keeps dropping? What's happening? What do I do to get it better. I'm scared and dont know the next move to make to get this site better. Help please! Thanks! http://www.moondoggieinc.com Kristy O
Intermediate & Advanced SEO | | KristyO1