Robot.txt help
-
Hi,
We have a blog that is killing our SEO.
We need to
Disallow
Disallow: /Blog/?tag*
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspxBut Allow everything below /Blog/Post
The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt
Thanks
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
Thanks it goes straight to www.mysite.com/Blog
-
Yup, I understand that you want to see your main site. This is why I recommended blocking only /Blog and not / (your root domain).
However, many blogs have a landing page. Does yours? In other words, when you click on your blog link, does it take you straight to Blog/posts or is there another page in between, eg /Blog/welcome?
If it does not go straight into Blog/posts you would want to also allow the landing page.
Does that make sense?
-
The structure is:
www.mysite.com - want to see everything at this level and below it
www.mysite.com/Blog - want to BLOCK everything at this level
www.mysite.com/Blog/posts - want to see everything at this level and below it
-
Well what Martijn (sorry, I spelled his name wrong before) and I were saying was not to forget to allow the landing page of your blog - otherwise this will not be indexed as you are disallowing the main blog directory.
Do you have a specific landing page for your blog or does it go straight into the /posts directory?
I'd say there's nothing wrong with allowing both Blog/Post and Blog/post just to be on the safe side...honestly not sure about case sensitivity in this instance.
-
"We're getting closer David, but after reading the question again I think we both miss an essential point ;-)" What was the essential point you missed. sorry I don't understand. I don;t want to make a mistake in my Robot.txt so would like to be 100% sure on what you are saying
-
Thanks guys so I have
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/Postthat works. My Home page also works. I there anything wrong with including both uppercase "Post" and lowercase "post". It is lowercase on the site but want uppercase "P" just incase. Is there a way to make the entry non case sensitive?
Thanks
-
Correct, Martijin. Good catch!
-
There was a reason that I said he should test this!
We're getting closer David, but after reading the question again I think we both miss an essential point ;-). As we know also exclude the robots from crawling the 'homepage' of the blog. If you have this homepage don't forget to also Allow it.
-
Well, no point in a blog that hurts your seo
I respectfully disagree with Martijin; I believe what you would want to do is disallow the Blog directory itself, not the whole site. It would seem if you Disallow: / and _Allow:/Blog/Post _ that you are telling SEs not to index anything on your site except for /Blog/Post.
I'd recommend:
User-agent: *
Disallow: /Blog/
Allow: /Blog/PostThis should block off the entire Blog directory except for your post subdirectory. As Maritijin stated; always test before you make real changes to your robots.txt.
-
That would be something like this, please check this or test this within Google Webmaster Tools if it works because I don't want to screw up your whole site. What this does is disallowing your complete site and just allows the /Blog/Post urls.
User-agent: *
Disallow: /
Allow: /Blog/Post
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Urgent help needed for site move with major ranking loss
URGENT HELP/ADVICE NEEDED I am so stressed and worried about my website domain change. I desperately need advice as soon as possible. I will try my best to keep this as brief as possible. I have owned and operated my punk clothing business online at the URL toofastonline.com for 15 years now. And for a long long time we ranked #1 for punk clothing on Google & life was good. However, thanks to the arrival of several cheap marketplaces and other unanticipated changes our ranking dropped considerably. The last few years have been extremely hard on us, to say the least, we came really close to losing the business altogether. But finally after lots of hard work & long hours, things started to improve. Ranking went back up, and we were busy again. I had been toying with the idea of buying the domain TooFast.com for about 10 years, but I never had the money to do it until this now, so I made the leap and as of Jan 9, toofastonline.com became toofast.com. Unfortunately, I now know that I set up the domain change hastily, without doing any of the pre-work Google suggests to do. I didn’t know it then but I did it wrong. And our site which wasranking #7 for punk clothing on Jan. 8th is now number 51 and today is only Jan 24th! I AM PANICKING. I have looked for help, posting jobs on Shopify Experts site several times now, opening accounts with MOZ and SEM Rush, spending countless hours on the phone with GoDaddy, Shopify and even long chats with Google. I have spent all day everyday for the past two weeks trying fix everything to no avail. No one can start on my site issues fast enough. And I have been given so much wrong information that I feel like I have done irreparable damage. I was (am) not qualified to make this kind of a site change alone. Too much was done too fast and without any real working knowledge Google SEO. My brother was the SEO guy and since he left the business I have just been struggling along with it, just trying to keep my head above water. So now for the big question: Should I temporarily change my Shopify stores domain back to toofastonline.com? This way I couldstart at the beginning, fix all the 404 redirects, fix the 301 redirects, clean up code, get the site in top working condition, and then, as Google suggests in theirGoogle Search Console Change of Address Toolstart to do the change of address in small sections, I can not afford to make any more reckless decisions. I have started and stopped, updated, fixed, changed and tried to fix again too many times now. I dont want Google to think I am trying something shady.. I’m not, I just don’t know what I’m doing, and I need help. Here is as much info as I can think of, I am more than willing to pay for help or do the work myself, as long as what I am doing is the right thing. Any and all help/advice/offers are welcome! Maureen CONTACT DETAILS: NAME: Maureen Keough, Owner EM:<a style="-webkit-text-size-adjust: 100%;">Maureen@TooFast.com</a> PH: 856-599-1675 (W) DETAILS OF OUR SET-UP THE APPS & SERVICES WE USE: Google Admin / G-Suite User Gmail for emails Godaddy holds our domains Shopify hosts our storefront. My Shopify store was located at TooFastOnline.com for about 5 years Our Domain Changed From toofastonline.com to toofast.com on Jan 9 In Godaddy both toofastonline.com is being forwarded to toofast.com In Shopify I added toofast.com, made it my primary domain, but left toofastonline.com in there but it is just redirecting to toofast.com. STEPS TAKEN TO CHANGE | ADD | VERIFY THE NEW DOMAIN GoDaddy DNS Records Both Sites - Updated Pointing to Shopify’s IP Address GoDaddy Subdomains For TooFastOnline.com - Redirected But Causing SSL/HTTPS/Privacy errors GoDaddy Subdomains For TooFast.com - Added But Causing SSL/HTTPS/Privacy errors Google Admin - Updated Gmail MX Records TooFast - Added and Updated Gmail MX Records TooFastOnline - Unchanged Google Merchant Center - Updated TooFastOnline is now TooFast Google Merchant Product Feed- Updated TooFastOnline is now TooFast Google Ads - Finally got the New Feed Approved and It is Working Google Search Console - Updated I Think Sitemaps - Added and Asked To Crawl Google Analytics Added TooFast As A Property Seems To Be Working Google Analytics Tag Updated in Shopify Admin Google Search Console - Requested to Move TooFastOnline.com to TooFast.com, still not done. No Redirects were made prior to the “Move” All Social Media Channels Links were Updated By Us Mailerlite MX Records For Bulk Emails - Updated/Verified
Intermediate & Advanced SEO | | TooFast130 -
Can anyone help me figure out these sitelinks?
My company is Squatty Potty (yes, of magic unicorn fame) and I recently redid our website's navigation. We're overhauling it currently to rebuild the whole thing, but what is there should give a good idea of site hierarchy to Google I would think. The funny thing is, when you Google [squatty potty website] we do have sitelinks. But when you Google just [squatty potty] we don't. Any ideas on why sitelinks would appear on one search but not the other? I see they appear with [squatty potty logo] as well. I can't figure out how to get them to appear for my brand name search, any help appreciated!
Intermediate & Advanced SEO | | DanDeceuster0 -
Need help in de-indexing URL parameters in my website.
Hi, Need some help.
Intermediate & Advanced SEO | | ImranZafar
So this is my website _https://www.memeraki.com/ _
If you hover over any of the products, there's a quick view option..that opens up a popup window of that product
That popup is triggered by this URL. _https://www.memeraki.com/products/never-alone?view=quick _
In the URL you can see the parameters "view=quick" which is infact responsible for the pop-up. The problem is that the google and even your Moz crawler is picking up this URL as a separate webpage, hence, resulting in crawl issues, like missing tags.
I've already used the webmaster tools to block the "view" parameter URLs in my website from indexing but it's not fixing the issue
Can someone please provide some insights as to how I can fix this?0 -
Help needed for a 53 Page Internal Website Structure & Internal Linking
Hey all... I'm designing the structure for a website that has 53 pages. Can you take a look at the attached diagram and see if the website structure is ok? On the attached diagram I have numbered the pages from 1 to 53, with 1 being the most important home page - 2,3,4,5, being the next 4 important pages - 6,7,8... 15,16,17 being the 3rd set of important pages, and 18,19,20..... 51,52,53 being the last set of pages which are the easiest to rank. I have two questions: Is the website structure for this correct? I have made sure that all pages on the website are reachable. Considering the home page, and page number 2,3,4,5 are the most important pages - I am linking out to these pages from the the last set of pages (18,29,20...51,52,53). There are 36 pages in the last set - and out of this 36, from 24 of them I am linking back to home page and page number 2,3,4,5. The remaining 8 pages of the 36 will link back to pages 6,7,8...15,16,17. In total the most importnat page will have the following number of internal incoming links: Home Page : 25 Pages 2,3,4,5 : 25 Pages 6,7,8...15,16,17 : 4 Pages 18,19,20...51,52,53 : 1 Is this ok considering home page, and pages 2,3,4,5 are the most important? Or do you think I should divide and give more internal links to the other pages also? If you can share any inputs or suggestions to how I can improve this it will greatly help me. Also if you know any references for good guides to internal linking of websites greater that 50 pages please share them in the answers. Thank you all! Regards, P.S - The URL for the image is at http://imgur.com/XqaK4
Intermediate & Advanced SEO | | arjun.rajkumar810 -
Geo-Domain Centralization - Helps or Hurts a Long-Term Campaign?
I have a client with nearly 100 geo-specific domains (example: serviceincity.com). The content is mostly duplicate, however they weren't affected by Panda or Penguin, and most of the domains have a PR2-PR4. Doesn't mean they won't eventually (I know). My strategy is to centralize all the city domains and 301 them to their main website (example: brandname.com/locations/city/). However, their IBL profile shows at least 50% of their IBLs coming from the geo-specific domains, which makes centralizing quite a scary thing for short-term ranking. Having these domains is obviously not scalable from a social media or video SEO perspective, and we all know that in the long-term brand rules and domaining drools. Before I suggest they that they 301 these domains, I thought I'd get feedback from the community. Will all that 301 redirecting give more weight to the primary domain's visibility and sustain the ranking at a page-level, or will it send a flag to Google that the site might have been using it's own network of websites to game results? (which wasn't the case, the owner was just hyper with dominating in each city). Thanks in advance for your feedback.
Intermediate & Advanced SEO | | stevewiideman0 -
Help! Why did Google remove my images from their index?
I've been scratching my head over this one for a while now and I can't seem to figure it out. I own a website that is user-generated content. Users submit images to my sites of graphic resources (for designers) that they have created to share with our community. I've been noticing over the past few months that I'm getting completely dominated in Google Images. I used to get a ton of traffic from Google Images, but now I can't find my images anywhere. After diving into Analytics I found this: http://cl.ly/140L2d14040Q1R0W161e and realized sometime about a year ago my image traffic took a dive. We've gone back through all the change logs and can't find where we made any changes to the site structure that could have caused this. We are stumped. Does anyone know of any historical Google updates that could have caused this last year around the end of April 2010? Any help or insight would be greatly appreciated!
Intermediate & Advanced SEO | | shawn810 -
Block all but one URL in a directory using robots.txt?
Is it possible to block all but one URL with robots.txt? for example domain.com/subfolder/example.html, if we block the /subfolder/ directory we want all URLs except for the exact match url domain.com/subfolder to be blocked.
Intermediate & Advanced SEO | | nicole.healthline0 -
International IP redirection - help please!
Hi, We have a new client who has built a brand in the UK on a xyz.com domain. The "xyz.com" is now a brand and features on all marketing. Lots of SEO work has taken place and the UK site has good rankings and traffic. They have now expanded to the US and with offline marketing leading the way, xyz.com is the brand being pushed in the US. So with the launch of the offline marketing US IP's are now redirected to a US version of the site (subfolder) with relevant pricing and messaging. This is great for users, but with Googlebot being on a US IP it is also being redirected and the UK pages have now dropped out of the index. The solution we need would ideally have both UK and US users searching for xyz.com, but would see them land on respective static pages with correct prices. Ideally no link authority would be moved via redirection of users. We have considered the following solutions Move UK site to subfolder /uk and redirect UK ips to this subfolder (and so not googlebot) downside of this is it will massively impact the UK rankings which are the core driver of the business - also would this be deemed as illegal cloaking? natural links will always be to the xyz.com page and so longer term the US homepage will gain authority and UK homepage will be more reliant on artificial linkbuilding. Use a overlay that detects IP address and requests users to select relevant country (and cookies to redirect on second visit) this has been rejected by ecommerce team as will increase bounce rate% & we dont want users to be able to see other countries due to prduct and price differences. Use a homepage with country selection (and cookies to redirect on second visit) this has been rejected by ecommerce team as will increase bounce rate% & we dont want users to be able to see other countries due to prduct and price differences. Is there an easy solution to this problem that we're overlooking? Is there another way of legal cloaking we could use here? Many thanks in advance for any help here
Intermediate & Advanced SEO | | Red_Mud_Rookie0