Robot.txt help
-
Hi,
We have a blog that is killing our SEO.
We need to
Disallow
Disallow: /Blog/?tag*
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspxBut Allow everything below /Blog/Post
The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt
Thanks
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
Thanks it goes straight to www.mysite.com/Blog
-
Yup, I understand that you want to see your main site. This is why I recommended blocking only /Blog and not / (your root domain).
However, many blogs have a landing page. Does yours? In other words, when you click on your blog link, does it take you straight to Blog/posts or is there another page in between, eg /Blog/welcome?
If it does not go straight into Blog/posts you would want to also allow the landing page.
Does that make sense?
-
The structure is:
www.mysite.com - want to see everything at this level and below it
www.mysite.com/Blog - want to BLOCK everything at this level
www.mysite.com/Blog/posts - want to see everything at this level and below it
-
Well what Martijn (sorry, I spelled his name wrong before) and I were saying was not to forget to allow the landing page of your blog - otherwise this will not be indexed as you are disallowing the main blog directory.
Do you have a specific landing page for your blog or does it go straight into the /posts directory?
I'd say there's nothing wrong with allowing both Blog/Post and Blog/post just to be on the safe side...honestly not sure about case sensitivity in this instance.
-
"We're getting closer David, but after reading the question again I think we both miss an essential point ;-)" What was the essential point you missed. sorry I don't understand. I don;t want to make a mistake in my Robot.txt so would like to be 100% sure on what you are saying
-
Thanks guys so I have
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/Postthat works. My Home page also works. I there anything wrong with including both uppercase "Post" and lowercase "post". It is lowercase on the site but want uppercase "P" just incase. Is there a way to make the entry non case sensitive?
Thanks
-
Correct, Martijin. Good catch!
-
There was a reason that I said he should test this!
We're getting closer David, but after reading the question again I think we both miss an essential point ;-). As we know also exclude the robots from crawling the 'homepage' of the blog. If you have this homepage don't forget to also Allow it.
-
Well, no point in a blog that hurts your seo
I respectfully disagree with Martijin; I believe what you would want to do is disallow the Blog directory itself, not the whole site. It would seem if you Disallow: / and _Allow:/Blog/Post _ that you are telling SEs not to index anything on your site except for /Blog/Post.
I'd recommend:
User-agent: *
Disallow: /Blog/
Allow: /Blog/PostThis should block off the entire Blog directory except for your post subdirectory. As Maritijin stated; always test before you make real changes to your robots.txt.
-
That would be something like this, please check this or test this within Google Webmaster Tools if it works because I don't want to screw up your whole site. What this does is disallowing your complete site and just allows the /Blog/Post urls.
User-agent: *
Disallow: /
Allow: /Blog/Post
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
PLEASE HELP - Old query string URL causing problems
For a long time, we were ranking 1st/2nd for the term "Manual handling training". That was until about 5 days ago when I realised that Google had started to index not only a query stringed URL, but also an old version of the URL. What was even weirder was that when you clicked on the result it 301 redirected to the page that it was meant to display... The wrong URL that Google had started to index was: www.ihasco.co.uk/courses/detail/manual-handling?channel=retail The correct URL that it should have been indexing is: https://www.ihasco.co.uk/courses/detail/manual-handling-training I can't get my head around why it has done this as a 301 was in place already and we use rel canonical tags which point to the main parent pages. Anyway, we slapped a noindex tag in our robots.txt file to stop that page from being indexed, which worked but now I can't get the correct page to be indexed, even after a Google fetch. After inspecting the correct URL in the new search console I discovered that Google has ignored the rel canonical on the page (Which points to itself) and has selected the wrong, query stringed URL as the canonical. Why? and how do I rectify this?
Intermediate & Advanced SEO | | iHasco1 -
Rankings drop after https migration [Need urgent help]
hey hi guys, i have lost all my organic traffic and in need of urgent help. Plz i need all your SEO expertise. My website is: makemoneyadultcontent.com (before judging me, let me tell you that i have been in adult industry for almost 7 years now and hence this blog was a way to pass my knowledge to everyone who is looking for it) Recently i saw that moving to HTTPS will benefit my site. I did not knew much about technical details of moving to https. My hosting was on siteground, and they have a button which you can press and your website will then be served through https. So i did that. Nothing else was done. this was done on 13th or 14th of june For the next few weeks here is my analytics (image attached) : As you can see the traffic fluctuated and finally i lost everything on 29th june. When i researched more, i found out that you need to follow many steps in order to move to HTTPS. So i followed this guide and completed all the steps: https://www.keycdn.com/blog/http-to-https/ (also checked many other guides and tried to complete all the steps mentioned) Everyone said that the rankings will recover in one week, but now it has been more than 17 days and still the rankings have not improved. My articles are indexed as i can see it using site:makemoneyadultcontent.com but none of them are ranking for the keywords that i was ranking earlier. i have submitted the sitemap in google webmaster and around 166 (out of 218) pages are indexed. Here are few more images to help my case: plz plz help me what i can do to get my website traffic back. i have spent a lot of time and effort in building this site, cant see it die a slow death like this one. 8Wa8c [http ahref](http ahref) D42xw sAd9E
Intermediate & Advanced SEO | | akki910 -
Google robots.txt test - not picking up syntax errors?
I just ran a robots.txt file through "Google robots.txt Tester" as there was some unusual syntax in the file that didn't make any sense to me... e.g. /url/?*
Intermediate & Advanced SEO | | McTaggart
/url/?
/url/* and so on. I would use ? and not ? for example and what is ? for! - etc. Yet "Google robots.txt Tester" did not highlight the issues... I then fed the sitemap through http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php and that tool actually picked up my concerns. Can anybody explain why Google didn't - or perhaps it isn't supposed to pick up such errors? Thanks, Luke0 -
Question about robots file on mobile devices
Hi We have a robots.txt file, but do I need to create a separate file for the m.site or can I just add the line into my normal robots file. Ive just read the Google Guidelines (what a great read it was) and couldn't find my answer. Thanks in Advance Andy
Intermediate & Advanced SEO | | Andy-Halliday0 -
Consistent Ranking Jumps Page 1 to Page 5 for months - help needed
Hi guys and gals, I have a really tricky client who I just can't seem to gain consistency with in their SERP results. The keywords are competitive but what the main issue I have is the big page jumps that happen pretty much on a weekly basis. We go up and down 40 positions and this behaviour has been going on for nearly 6 months.
Intermediate & Advanced SEO | | Jon_bangonline
I felt it would resolve itself in time but it has not. The website is a large ecommerce website. Their link profile is OK in regards to several high quality newspaper publication links, majority brand related anchor texts and the link building we have engaged in has all been very good i.e. content relevant / high quality places. See below for some potential causes I think could be the reason: The on page SEO is good however the way their ecommerce website is setup they have formed a substantial amount of duplicate title tags. So in my opinion this is a potential cause. The previous web developer set-up 301 redirects all to their home page for any 404 errors. I know best practice is to go to the most relevant pages, however could this be a potential issue? We had some server connectivity issues show up in webmasters tools but that was for 1 day about 4 months ago. Since then no issues. they have quite a few 'blocked URLs' in their robots.txt file, e.g. Disallow: /login, Disallow: /checkout/ but to me these seem normal and not a big issue. We have seen a decrease over the last 12 months in Webmasters Tools of 'total indexed web pages' from 5000 to 2000 which is quite an odd statistic. Summary So all in all I am a tad stumped. We have some duplicate content issues in title tags, perhaps not following best practice in the 301 redirects but other than that I dont see any major on page issues, unless I am missing something in the seriousness of what I have listed.
Finally we have also do a bit of a cull of poor quality links, requesting links to be removed and also submitting a 'disavow' of some really bad links. We do not have a manual penalty though. Thoughts, feedback or comments VERY welcome.0 -
Homepage is losing rankings...need help!
Hi folks, Our company has been doing SEO for www.3dincites.com since October. Since the website hadn't been optimized at all, one of the first things we did was to add keyword-optimized page title and meta description to the homepage. The person who implemented didn't do it right the first time so we needed to change the page title again after a week or so. We also changed the preferred domain in GWT. Ever since those changes were implemented, the homepage has been decreasing in rankings for the the keywords we optimized it for. This is very surprising as the website has high-quality unique content and pretty solid backlinks. Moreover, the search traffic to other pages is through the roof (2K increase since October) so the client is happy, however, this decrease in rankings for the homepage (from 3d to 6th page for "3d IC technology" which is the main keyword) doesn't let me sleep well at nights. Any ideas why this is happening?
Intermediate & Advanced SEO | | zoomzoom0 -
I have two sitemaps which partly duplicate - one is blocked by robots.txt but can't figure out why!
Hi, I've just found two sitemaps - one of them is .php and represents part of the site structure on the website. The second is a .txt file which lists every page on the website. The .txt file is blocked via robots exclusion protocol (which doesn't appear to be very logical as it's the only full sitemap). Any ideas why a developer might have done that?
Intermediate & Advanced SEO | | McTaggart0 -
Whats Next, noobie needs some help :)
Hey Everyone, I have been a member of SEOMOZ for about 4 months and love it, it really has helped me out. my first website I am trying to get well ranked in the new zealand market is www.shopezy.co.nz I have chosen: sell online free ecommerce website ecommerce website builder Are paid links the way forward for me? should I aim for more keywords? should I pay for help? Just looking for a lttle direction, if someone could help. thanks.
Intermediate & Advanced SEO | | bonmaklad0