Robot.txt help
-
Hi,
We have a blog that is killing our SEO.
We need to
Disallow
Disallow: /Blog/?tag*
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspxBut Allow everything below /Blog/Post
The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt
Thanks
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
Thanks it goes straight to www.mysite.com/Blog
-
Yup, I understand that you want to see your main site. This is why I recommended blocking only /Blog and not / (your root domain).
However, many blogs have a landing page. Does yours? In other words, when you click on your blog link, does it take you straight to Blog/posts or is there another page in between, eg /Blog/welcome?
If it does not go straight into Blog/posts you would want to also allow the landing page.
Does that make sense?
-
The structure is:
www.mysite.com - want to see everything at this level and below it
www.mysite.com/Blog - want to BLOCK everything at this level
www.mysite.com/Blog/posts - want to see everything at this level and below it
-
Well what Martijn (sorry, I spelled his name wrong before) and I were saying was not to forget to allow the landing page of your blog - otherwise this will not be indexed as you are disallowing the main blog directory.
Do you have a specific landing page for your blog or does it go straight into the /posts directory?
I'd say there's nothing wrong with allowing both Blog/Post and Blog/post just to be on the safe side...honestly not sure about case sensitivity in this instance.
-
"We're getting closer David, but after reading the question again I think we both miss an essential point ;-)" What was the essential point you missed. sorry I don't understand. I don;t want to make a mistake in my Robot.txt so would like to be 100% sure on what you are saying
-
Thanks guys so I have
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/Postthat works. My Home page also works. I there anything wrong with including both uppercase "Post" and lowercase "post". It is lowercase on the site but want uppercase "P" just incase. Is there a way to make the entry non case sensitive?
Thanks
-
Correct, Martijin. Good catch!
-
There was a reason that I said he should test this!
We're getting closer David, but after reading the question again I think we both miss an essential point ;-). As we know also exclude the robots from crawling the 'homepage' of the blog. If you have this homepage don't forget to also Allow it.
-
Well, no point in a blog that hurts your seo
I respectfully disagree with Martijin; I believe what you would want to do is disallow the Blog directory itself, not the whole site. It would seem if you Disallow: / and _Allow:/Blog/Post _ that you are telling SEs not to index anything on your site except for /Blog/Post.
I'd recommend:
User-agent: *
Disallow: /Blog/
Allow: /Blog/PostThis should block off the entire Blog directory except for your post subdirectory. As Maritijin stated; always test before you make real changes to your robots.txt.
-
That would be something like this, please check this or test this within Google Webmaster Tools if it works because I don't want to screw up your whole site. What this does is disallowing your complete site and just allows the /Blog/Post urls.
User-agent: *
Disallow: /
Allow: /Blog/Post
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I need help in doing Local SEO
Hey guys I hope everyone is doing well. I am new to SEO world and I want to do local SEO for one of my clients. The issue is I do not know how to do Local SEO at all or where to even start. I would appreciate it if anyone could help me or give me an article or a course to learn how to do it. Main question The thing that I want to do is that, I want my website to show up in top 3 google map results for different locations(which there is one actual location). For example I want to show up for
Intermediate & Advanced SEO | | seopack.org.ofici3
online clothing store in new york
online clothing store in los angeles or... Let's assume that we can ship our product to every other cities. So I hope I could deliver what I mean. I'd appreciate it if you could answer me with practical solutions.0 -
Robots.txt and redirected backlinks
Hey there, since a client's global website has a very complex structure which lead to big duplicate content problems, we decided to disallow crawler access and instead allow access to only a few relevant subdirectories. While indexing has improved since this I was wondering if we might have cut off link juice. Since several backlinks point to the disallowed root directory and are from there redirected (301) to the allowed directory I was wondering if this could cause any problems? Example: If there is a backlink pointing to example.com (disallowed in robots.txt) and is redirected from there to example.com/uk/en (allowed in robots.txt). Would this cut off the link juice? Thanks a lot for your thoughts on this. Regards, Jochen
Intermediate & Advanced SEO | | Online-Marketing-Guy0 -
301 redirects broken - problems - please help!
Hi, I have a bit of an issue... Around a year ago we launched a new company. This company was launched out of a trading style of another company owned by our parent group (the trading style no longer exists). We used a lot of the content from the old trading style website, carefully mapping page-to-page 301 redirects, using the change of address tool in webmaster tools and generally did a good job of it. The reason I know we did a good job is that although we lost some traffic in the month we rebranded, we didn't lose rankings. We have since gained traffic exponentially and have managed to increase our organic traffic by over 200% over the last year. All well and good. However, a mistake has recently occurred whereby the old trading style website domain was deleted from the server for a period of around 2-3 weeks. It has since been reinstated. Since then, although we haven't lost rankings for the keywords we track I can see in webmaster tools that a number of our pages have been deindexed (around 100+). It has been suggested that we put the old homepage back up, and include a link to the XML sitemap to get Google to recrawl the old URLs and reinstate our 301 redirects. I'm OK with this (up to a point - personally I don't think it's an elegant solution) however I always thought you didn't need a link to the xml sitemap from the website and that the crawlers should just find it? Our current plan is not to put the homepage up exactly as it was (I don't believe this would make good business sense given that the company no longer exists), but to make it live with an explanation that the website has moved to a different domain with a big old button pointing to the new site. I'm wondering if we also need a button to the xml sitemap or not? I know I can put a sitemap link in the robots file, but I wonder if that would be enough for Google to find it? Any insights would be greatly appreciated. Thank you, Amelia
Intermediate & Advanced SEO | | CommT0 -
Using folder blocked by robots.txt before uploaded to indexed folder - is that OK?
I have a folder "testing" within my domain which is a folder added to the robots.txt. My web developers use that folder "testing" when we are creating new content before uploading to an indexed folder. So the content is uploaded to the "testing" folder at first (which is blocked by robots.txt) and later uploaded to an indexed folder, yet permanently keeping the content in the "testing" folder. Actually, my entire website's content is located within the "testing" - so same URL structure for all pages as indexed pages, except it starts with the "testing/" folder. Question: even though the "testing" folder will not be indexed by search engines, is there a chance search engines notice that the content is at first uploaded to the "testing" folder and therefore the indexed folder is not guaranteed to get the content credit, since search engines see the content in the "testing" folder, despite the "testing" folder being blocked by robots.txt? Would it be better that I password protecting this "testing" folder? Thx
Intermediate & Advanced SEO | | khi50 -
Google Recon Request 4 Failed - This is crazy. HELP!
We run a niche website selling sunglasses at www.aluminumeyewear.com. I've been trying to resolve a 'Failed Quality Guidelines' message since May. My 4th recon request has just failed and I've exhausted all changes that I believe I need to make. I rely on this site to pay my bills etc so obviously I really need to get this resolved. I would be grateful if someone from Google could actually point out whats wrong instead of an unhelpful auto response.Steps taken.1. Rewrote content as it was a bit thin. Recon failed.2. Removed old products that couldn't be reached from every page. Recon failed.3. Submitted back link audit and added 'sitemap' link to footer. Recon Failed.4. Removed 40+ old urls that existed from old Yahoo! store (didn't realize they still existed). Recon failed.I felt sure #4 would resolve the issue so feeling pretty low right now that it didn't. That being said doing a site:aluminumeyewear.com it looks like I missed one of them which was http://www.aluminumeyewear.com/demora/black/, however it just returns a 404 which would seem harsh to penalize me for.The only other pages that I can think of are some dynamic pages that the store uses to create reviews such as:www.aluminumeyewear.com/product-reviews-add.aspx?product=2www.aluminumeyewear.com/resize.aspxI'm pretty sure that the reviews page is blocked via robots txt. The resize.aspx is a blank page with javascript as its needed by the PowerReviews Express system to work, and many many merchants use that platform so it would be hard to think its that.Thanks in advance.
Intermediate & Advanced SEO | | smckenzie750 -
Do links to Blog articles help that much?
So here's my question/scenario.. When it comes to link-building, I'm noticing a trend that goes like this: If you have a website like www.insurancelondonontario.com and you want to rank highly for the target phrase 'insurance london ontario', you need to get links with that anchor-text pointing to the index page, which is the page you want to rank for that keyword. But what I'm noticing, is that a lot of link builders use a strategy where they create a good piece of content.. like "10 Ways to Decrease your Car Insurance Premiums" within the blog, and then build links to that article since it's easier as it's a good piece of content. My question is.. how much can this really help you to rank for 'insurance london ontario' if all your doing is building links to that blog article, and not the main page? I know it helps the overall domain authority, but is it enough to get you ranking for your goal phrase, or is it just a supporting method?
Intermediate & Advanced SEO | | ATMOSMarketing560 -
Newbie to SEO and SEOMOZ help
Hey everyone i just came across SEOMOZ today, i have been building websites for 3 years now but SEO is something which has always been a scary topic to consider trying to master. I have made a decision to do this in 2012 and i have been looking for a software package which can stear me and teach me. I have been reading the site help today and i feel totally swamped! i have created my campaign but a lot of the results dont make much sense to me and i am unsure of how to fix the errors they found. For instance the crawl diagnostics shows i have 5 4xx client errors. They show me a link to the page where the error is http://www.mydomain.com/category/latest-news/function.require but when i go to see what this is i just find an error 404 not found page.How do i go about removing this error if i have no idea where the problem is? I have started reading SEO User guide and beginers guide and i know it is going to take me a long time to get use to this all, but i am struggling to find the starting point and hope someone can possible help me find the first few steps. Thanks
Intermediate & Advanced SEO | | buntrosgali0 -
Help Needed - 301 a .co.uk to a .com Serp Questions
Hey, really need some help deciding what to do... I have a .co.uk site, its my oldest and best site of my network and accounts for maybe 30-40% of my income. Although its a .co.uk site, it actually makes most of its from from USA traffic and targets many terms for the US market - but the problem is that due to it being a .co.uk it doesnt rank as well in G .com and over the last few years Google has defiantly widened the gap as such for the ability for a .co.uk to rank in G .com. Many terms that I used to be #1 for in G .com, I now rank position 5-10 only, but in G .co.uk I'm #1 and often with a duo listing so I wouldnt put the loss of rankings in G .com down to just losing rankings naturally. Now many of my key pages are gradually losing rankings in G .com which is not good and really frustrating Feedback Needed So my dilemma is do I risk my best site and 301 it to a .com hosted in the US for potential at a guess 50% increase in revenues and more future potential (If the 301 worked well and got some US rankings back - Im sure longtail would increase lots too) ? If people with experience with 301ing sites to a new domain could let me know how they did or if you're an SEO and have done this many times, how many times on average has Serps remained stable / unchanged ? Trying to work out the reward to risk ratio, like on average if the transition is seamless 90% of the time it would seem worth the gamble, but if its 50% then I would say its not worth it.
Intermediate & Advanced SEO | | goody2shoes0