Blocked by meta-robots but there is no robots file
-
OK, I'm a little frustred here. I've waited a week for the next weekly index to take place after changing the privacy setting in a wordpress website so Google can index, but I still got the same problem. Blocked by meta-robots, no index, no follow. But I do not see a robot file anywhere and the privacy setting in this Wordpress site is set to allow search engines to index this site. Website is www.marketalert.ca
What am I missing here? Why can't I index the rest of the website and is there a faster way to test this rather than wait another week just to find out it didn't work again?
-
The .htaccess file is in placing directing www to non www, so I don't see what else I could do with that. I forgot to mention the website was recently overhauled by someone else, and they are having me help with SEO. Not sure if that has anything to do with it. It looks like the .htaccess should be reversed so the non www points to the www which has more value. Someone else designed this site and they are having me do the SEO on it for them.
-
The issue might be the forwarding from www.yourdomain.ca to yourdomain.ca
look at http://www.opensiteexplorer.org/pages?site=marketalert.ca%2F
and here http://www.opensiteexplorer.org/pages?site=www.marketalert.ca%2F
..some are indexed on with www and other without www. , this is your main issue.
recommendation:
- revisit the htaccess file or where the redirect has been set DNS..
- choose one with www or without and stick to it.
- revicit your external links and make the changes to your links
- create new sitemap and resubmit to SearchEngines
-
I ran the SEO web crawler and it finished already. Successfully crawled all pages. I still have to wait for another week to get the main campaign updated and see results there, but I believe it may work too now.
I guess I solved my own problem after being directed to robots.txt by Jim. I found that the Wordpress plugin for SEO xml sitemap creator was the problem because it created a virtual robots.txt file which sent me on a wild goose chase looking for a robots.txt file which didn't exist. Creating a robots.txt file allowing all seems to be the solultion, incase anyone else has this same problem.
-
If you can, follow up either way - happy to help you get it debugged!
-
I was able to update my sitemap.xml with Google webmaster tools no problem. I'm not 100% confident though that means the entire site is searchable by the spiders. I guess I'll know for sure in a few days tops.
-
I agree with Jim. Update your sitemap.xml files with Google Webmaster Tools. That will also help you identify problems you might be missing.
-
I've done some more looking into it and seems to be a problem when Wordpress uses the XML site generator plugin. It creates a virtual robot.txt file, which is why I couldn't find the robot.txt file. Apparently the only fix is to replace it with an actual robot.txt file forcing it to allow all.
I just replaced the robots.txt file with a real one allowing all. SEOmoz estimates a few days to test site crawl and it's another 7 days before the next scheduled crawl. I'd kinda like to find out sooner if it's not going to work. There must be a faster test. I don't need a detailed test, just a basic test that says, YEP, we can see this many pages or something like that.
-
hi
your robots.txt file is located here http://marketalert.ca/robots.txt, which is the root of your website directory.
this is the actual location of your sitemap file (http://marketalert.ca/sitemap.xml), does the Google WT show any issues about the sitemap file could not be found?
You might need to resubmit the sitemap file, if there are any changes, of course with the updated version of your site.
hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
LSI keywords logic - enter in meta and bold in text?
Hello, In the lack of good info about this on the Internet, let me try here. I know that it is a good idea to put LSI keywords in natural flow in the body text of the article. But shall I also put LSI keywords as a meta? In the same manner as doing with non-LSI keywords? Or shall I only reserve meta for non-LSI keywords? In body text, shall I emphasize LSI keywords in bold? As non-LSI keywords already does. This is a bit confusing as I don't wan't LSI keywords to take over show from my long tail (phrase) keyword. I will appreciate if someone could share a bit light over this. Thanks in advance!
Technical SEO | | SEOisSEO0 -
Robots.txt and joomla
Hello, I use joomla for my website and automatically all those files are blocked is that good or bad, so I remove anything and if so why ? User-agent: *
Technical SEO | | seoanalytics
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ I also added to my robots.txt files my email address ( is that useful, I am afraid google passes PR to the email address )
and a javascript: void (0) because I have tabs on my webpage ( is that useful )
as well as a .pdf ( is it also useful ) any comments ? does anything need to be changed or is it ok ? Thank you,0 -
Oh no googlebot can not access my robots.txt file
I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica0 -
Block or remove pages using a robots.txt
I want to use robots.txt to prevent googlebot access the specific folder on the server, Please tell me if the syntax below is correct User-Agent: Googlebot Disallow: /folder/ I want to use robots.txt to prevent google image index the images of my website , Please tell me if the syntax below is correct User-agent: Googlebot-Image Disallow: /
Technical SEO | | semer0 -
Block Baidu crawler?
Hello! One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk. Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site? What do you suggest?
Technical SEO | | AJPro0 -
Help needed please with 301 redirects in htaccess file.
In summary, we're currently having issues with our htaccess file. 301 redirects are going through to the new described URL but in addition the new URL is followed by a ? and the old URL. How can we get rid of the ? and previous URL so they don't appear as an ending. None of the examples we've found re this issue online appear to work. Can anyone please offer some advice? Can we use a RewriteRule to stop this happening? Here's a summary of the htaccess file REDIRECT CODE BEGINS HERE LONG LIST OF REDIRECTS, which appear to be set up perfectly fine. REDIRECT CODE ENDS DirectoryIndex index.php <ifmodule mod_rewrite.c="">RewriteEngine On Options +FollowSymLinks
Technical SEO | | petersommertravels
DirectoryIndex index.php
RewriteEngine On
RewriteCond $1 !^(images|system|themes|pdf|favicon.ico|robots.txt|index.php) [NC]
RewriteRule ^.htaccess$ - [F]
RewriteRule ^favicon.ico - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /index.php?/$1 [L]</ifmodule> DirectoryIndex index.php0 -
Robots.txt questions...
All, My site is rather complicated, but I will try to break down my question as simply as possible. I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this: # /robots.txt file for http://webcrawler.com/
Technical SEO | | Horizon
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/ I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this: **User-agent: ***
Disallow: /ControlPanel/ Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this: # /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/ Or, like this: # /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/ Thanks in advance. Matt0