I want to block search bots in crawling all my website's pages expect for homepage. Is this rule correct?
-
User-agent: *
Disallow: /*
-
-
Thanks Matt! I will surely test this one.
-
Thanks David! Will try this one.
-
Use this:
User-agent: Googlebot
Noindex: /User-agent: Googlebot
Disallow: /User-agent: *
Disallow: /This is what I use to block our dev sites from being indexed and we've had no issues.
-
Actually, there are two regex that Robots can handle - asterisk and $.
You should test this one. I think it will work (about 95% sure - tested in WMT quickly):
User-agent: *
Disallow: /
Allow: /$ -
I don't think that will work. Robots.txt doesn't handle regular expressions. You will have to explicitly list all of the folders, and files to be super sure, that nothing is indexed unless you want it to be found.
This is kind of an odd question. I haven't thought about something like this in a while. I usually want everything but a couple folders indexed. : ) I found something that may be a little more help. Try reading this.
If you're working with extensions, you can use **Disallow:/*.html$ **or php or what have you. That may get you closer to a solution.
Definitely test this with a crawler that obeys robots.txt.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How long will old pages stay in Google's cache index. We have a new site that is two months old but we are seeing old pages even though we used 301 redirects.
Two months ago we launched a new website (same domain) and implemented 301 re-directs for all of the pages. Two months later we are still seeing old pages in Google's cache index. So how long should I tell the client this should take for them all to be removed in search?
Intermediate & Advanced SEO | | Liamis0 -
How to create AMP Pages for product website?
How to create AMP Pages for product website? I mean we can create it easily when we have wordpress through plugin, what about when we have millions of pages, It would be too tedious to create amp version of every page. So, is there any alternative way to create amp version?
Intermediate & Advanced SEO | | sachin.kaushik0 -
404's - Do they impact search ranking/how do we get rid of them?
Hi, We recently ran the Moz website crawl report and saw a number of 404 pages from our site come back. These were returned as "high priority" issues to fix. My question is, how do 404's impact search ranking? From what Google support tells me, 404's are "normal" and not a big deal to fix, but if they are "high priority" shouldn't we be doing something to remove them? Also, if I do want to remove the pages, how would I go about doing so? Is it enough to go into Webmaster tools and list it as a link no to crawl anymore or do we need to do work from the website development side as well? Here are a couple of examples that came back..these are articles that were previously posted but we decided to close out: http://loyalty360.org/loyalty-management/september-2011/let-me-guessyour-loyalty-program-isnt-working http://loyalty360.org/resources/article/mark-johnson-speaks-at-motivation-show Thanks!
Intermediate & Advanced SEO | | carlystemmer0 -
How should I handle URL's created by an internal search engine?
Hi, I'm aware that internal search result URL's (www.example.co.uk/catalogsearch/result/?q=searchterm) should ideally be blocked using the robots.txt file. Unfortunately the damage has already been done and a large number of internal search result URL's have already been created and indexed by Google. I have double checked and these pages only account for approximately 1.5% of traffic per month. Is there a way I can remove the internal search URL's that have already been indexed and then stop this from happening in the future, I presume the last part would be to disallow /catalogsearch/ in the robots.txt file. Thanks
Intermediate & Advanced SEO | | GrappleAgency0 -
What's the Best Host For WordPress sites
Our site has gone down twice in a week...hosted by Fat Cow. So we're going to switch hosts this week. We currently have 2 WP sites on a Fat Cow VPS. 8 GB file size and 2 GB data transfer monthly. We use a CDN and video hosting company (Wistia) so the file sizes are small. I've contacted several hosts and narrowed it down to WP Engine, Rack Space and A Small Orange. I care about fast page load time (1 second), 99.999% up-time and great support. Price is a secondary concern. I'm leaning towards WP Engine, but wanted to ask Moz community before making a decision. Any other hosting companies I should call?
Intermediate & Advanced SEO | | Branden_S0 -
Ecommerce SEO - Indexed product pages are returning 404's due to product database removal. HELP!
Hi all, I recently took over an e-commerce start-up project from one of my co-workers (who left the job last week). This previous project manager had uploaded ~2000 products without setting up a robot.txt file, and as a result, all of the product pages were indexed by Google (verified via Google Webmaster Tool). The problem came about when he deleted the entire product database from our hosting service, godaddy and performed a fresh install of Prestashop on our hosting plan. All of the created product pages are now gone, and I'm left with ~2000 broken URL's returning 404's. Currently, the site does not have any products uploaded. From my knowledge, I have to either: canonicalize the broken URL's to the new corresponding product pages, or request Google to remove the broken URL's (I believe this is only a temporary solution, for Google honors URL removal request for 90 days) What is the best way to approach this situation? If I setup a canonicalization, would I have to recreate the deleted pages (to match the URL address) and have those pages redirect to the new product pages (canonicalization)? Alex
Intermediate & Advanced SEO | | byoung860 -
How can I tell if a website is a 'NoFollow'?
I've been link building for a long time but have recently discovered that most of my links are from NoFollow links, such as twitter and Youtube. How can I tell if a website is a 'NoFollow'?
Intermediate & Advanced SEO | | Paul_Tovey0 -
Website monitoring online censorship in China - what's holding us back?
We run https://greatfire.org, a non-profit website which lets you test if a website or keyword is blocked or otherwise censored in China. There are a number of websites that nominally offer this service, and many of them rank better than us in Google. However, we believe this is unfortunate since their testing methods are inaccurate and/or not transparent. More about that further down*. We started GreatFire in February, 2011 as a reaction to ever more pervasive online censorship in China (where we are based). Due to the controversy of the project and the political situation here, we've had to remain anonymous. Still, we've been able to reach out to other websites and to users. We currently have around 3000 visits per month out of which about 1000 are from organic search. However, SEO has been a headache for us from the start. There are many challenges in running this project and our team is small (and not making any money from this). Those users that do find us on relevant keywords seem to be happy since they spend a long time on the website. Examples: websites blocked in china: 6 minutes+
Intermediate & Advanced SEO | | GreatFire.org
great firewall of china test: 8 minutes+ So, here are some SEO questions related to GreatFire.org. If you can give us advice it would be greatly appreciated and you would truly help us in our mission to bring transparency and spread awareness of online censorship in China: Each URL tested in our database has its own page. Our database contains 25000 URLs (and growing). We have previously been advised that one SEO problem is that we appear to have a lot of duplicate data, since the individual URL pages are very similar. Because of this, we've added automatic tags to most pages. We then exclude certain pages from this rule that are considered high-priority, such as domains ranked highly by Alexa and keywords that are blocked. Is this a good approach? Do you think the duplicate content factor is still holding us back? Can we improve? Some of our pages have meta descriptions, but most don't. Should we add them on URL pages? They would be set to a certain pattern which again might make them look very similar and could cause the duplicate content warning to go off. Suggestions? Many of the users that find us in Google search for keywords that aren't relevant to what we offer, such as "https.facebook.com" and lots of variations of that. Obviously, they leave the website quickly. This means that the average time that people coming from Google are spending on our website is quite low (2 minutes) and the bounce rate quite high (68%). Can we or should we do something to discourage being found on non-relevant keywords? Are there any other technical problems you can see that are holding our SEO back? Thank you very much! *Competitors ranking higher searching for "test great firewall china": 1. http://www.greatfirewallofchina.org. They are only a frontend website for this service: http://www.viewdns.info/chinesefirewall. ViewDNS only checks for DNS records which is one of three major methods to block websites. So many websites and keywords that are not DNS poisoned, but are blocked by IP or by keyword, will be specified as available, when in fact they are blocked. Our system uses actual test locations inside China to try to download the URL to be tested and checks for different types of censorship. 2. http://www.websitepulse.com/help/testtools.china-test.html. This is a better service in that they seem to do actual testing from inside China. However, they only display partial results, they do not explain test results and they do not offer historic data on whether the URL was blocked in the past. We do all of that.0