Blocked by meta-robots but there is no robots file
-
OK, I'm a little frustred here. I've waited a week for the next weekly index to take place after changing the privacy setting in a wordpress website so Google can index, but I still got the same problem. Blocked by meta-robots, no index, no follow. But I do not see a robot file anywhere and the privacy setting in this Wordpress site is set to allow search engines to index this site. Website is www.marketalert.ca
What am I missing here? Why can't I index the rest of the website and is there a faster way to test this rather than wait another week just to find out it didn't work again?
-
The .htaccess file is in placing directing www to non www, so I don't see what else I could do with that. I forgot to mention the website was recently overhauled by someone else, and they are having me help with SEO. Not sure if that has anything to do with it. It looks like the .htaccess should be reversed so the non www points to the www which has more value. Someone else designed this site and they are having me do the SEO on it for them.
-
The issue might be the forwarding from www.yourdomain.ca to yourdomain.ca
look at http://www.opensiteexplorer.org/pages?site=marketalert.ca%2F
and here http://www.opensiteexplorer.org/pages?site=www.marketalert.ca%2F
..some are indexed on with www and other without www. , this is your main issue.
recommendation:
- revisit the htaccess file or where the redirect has been set DNS..
- choose one with www or without and stick to it.
- revicit your external links and make the changes to your links
- create new sitemap and resubmit to SearchEngines
-
I ran the SEO web crawler and it finished already. Successfully crawled all pages. I still have to wait for another week to get the main campaign updated and see results there, but I believe it may work too now.
I guess I solved my own problem after being directed to robots.txt by Jim. I found that the Wordpress plugin for SEO xml sitemap creator was the problem because it created a virtual robots.txt file which sent me on a wild goose chase looking for a robots.txt file which didn't exist. Creating a robots.txt file allowing all seems to be the solultion, incase anyone else has this same problem.
-
If you can, follow up either way - happy to help you get it debugged!
-
I was able to update my sitemap.xml with Google webmaster tools no problem. I'm not 100% confident though that means the entire site is searchable by the spiders. I guess I'll know for sure in a few days tops.
-
I agree with Jim. Update your sitemap.xml files with Google Webmaster Tools. That will also help you identify problems you might be missing.
-
I've done some more looking into it and seems to be a problem when Wordpress uses the XML site generator plugin. It creates a virtual robot.txt file, which is why I couldn't find the robot.txt file. Apparently the only fix is to replace it with an actual robot.txt file forcing it to allow all.
I just replaced the robots.txt file with a real one allowing all. SEOmoz estimates a few days to test site crawl and it's another 7 days before the next scheduled crawl. I'd kinda like to find out sooner if it's not going to work. There must be a faster test. I don't need a detailed test, just a basic test that says, YEP, we can see this many pages or something like that.
-
hi
your robots.txt file is located here http://marketalert.ca/robots.txt, which is the root of your website directory.
this is the actual location of your sitemap file (http://marketalert.ca/sitemap.xml), does the Google WT show any issues about the sitemap file could not be found?
You might need to resubmit the sitemap file, if there are any changes, of course with the updated version of your site.
hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Easy Question: regarding no index meta tag vs robot.txt
This seems like a dumb question, but I'm not sure what the answer is. I have an ecommerce client who has a couple of subdirectories "gallery" and "blog". Neither directory gets a lot of traffic or really turns into much conversions, so I want to remove the pages so they don't drain my page rank from more important pages. Does this sound like a good idea? I was thinking of either disallowing the folders via robot.txt file or add a "no index" tag or 301redirect or delete them. Can you help me determine which is best. **DEINDEX: **As I understand it, the no index meta tag is going to allow the robots to still crawl the pages, but they won't be indexed. The supposed good news is that it still allows link juice to be passed through. This seems like a bad thing to me because I don't want to waste my link juice passing to these pages. The idea is to keep my page rank from being dilluted on these pages. Kind of similar question, if page rank is finite, does google still treat these pages as part of the site even if it's not indexing them? If I do deindex these pages, I think there are quite a few internal links to these pages. Even those these pages are deindexed, they still exist, so it's not as if the site would return a 404 right? ROBOTS.TXT As I understand it, this will keep the robots from crawling the page, so it won't be indexed and the link juice won't pass. I don't want to waste page rank which links to these pages, so is this a bad option? **301 redirect: **What if I just 301 redirect all these pages back to the homepage? Is this an easy answer? Part of the problem with this solution is that I'm not sure if it's permanent, but even more importantly is that currently 80% of the site is made up of blog and gallery pages and I think it would be strange to have the vast majority of the site 301 redirecting to the home page. What do you think? DELETE PAGES: Maybe I could just delete all the pages. This will keep the pages from taking link juice and will deindex, but I think there's quite a few internal links to these pages. How would you find all the internal links that point to these pages. There's hundreds of them.
Technical SEO | | Santaur0 -
Google is not respecting the meta title
We're experiencing a peculiar situation with Google not respecting our meta <title>.</p> <p>As you can see in the first image (search result), the title <a href="http://open.iebschool.com/profesores/startups/">for the page</a> is a part of the content. This is relatevely normal for the description, but we never heard of Google doing this before.</p> <p>In the code, the <title> and meta description are correctly implemented.</p> <blockquote style="background-color: #f7f7f7; padding-top: 5px; margin-left: 0px; padding-left: 2px; padding-bottom: 5px; white-space: nowrap; overflow-y: auto; font-family: monospace; background-position: initial initial; background-repeat: initial initial;"> <p><meta name="description" content="Profesores, tutores, autores y docentes 2.0 de Open IEBS. Conoce su Biografía, experiencia, reputación, conexiones sociales y las valoraciones de alumnos."/><br /><title>Conoce los profesores, tutores, autores y docentes de Open IEBS.</title> In a further research, we discovered that the title which is using is an in anwith the following code (cleaned and simplified for the question): <hgroup> Pilar Soro
Technical SEO | | ofuente
0 Seguidor
Para poder seguir al Profesor, debes de registrarte aquí. Profesora y experta en redes sociales. Formadora de docentes, [...]
</hgroup> Note: we're correcting the code since this is quite messy, but it's the one we have now The point is that google has considered that this particular is more important than the title itself. This would make sense if we were looking for that name, but the search was simply "site:domain.com". Two things for which this is even more strange are the following: while all the /profesor/%category%/ has the same code, this only happens in some search results and not in all of them; why is it appearing in some pages, but respecting my title in others? the previous code is not the only one in the page, there are about 10 others and some are placed before and some are placed after; so, why this one and not the first or the last? What is more strange is why this article in particular and not any other of the 10 on the page since some of them are placed before and some of them are placed after. Provided this situation, we would like to know: is this a common situation? Is it happening to more people? why is it happening? Is it somehow related to , <hgroup>and ? why that piece of code and not any other article? and why is it only happening in some pages? more important, can it be corrected or can we take advantage of it somehow? Thank you in advance. Any light you can shed on this will be well received! AJ2CUSe.png?1?8232 </hgroup>0 -
Block bad crawlers
Hi! how are you? I've been working on some of my sites, and noticed that i'm getting lots of crawls by search engines that i'm not intereted in ranking well. My question is the following: do you have a list of 'bad behaved' search engines that take lots of bandwidth and don´t send much/good traffic? If so, do you know how to block them using robots.txt? Thanks for the help! Best wishes, Ariel
Technical SEO | | arielbortz0 -
Links under Meta Description when performing a search
Doing research for clients, I have came across seeing sites displaying hyperlinks underneath their own meta description. keywords that I have googled that result with hyperlinks displaying under meta descriptions: Google'd: iacquire (brand) bmw wheels (Beyern Wheels, position 1) aftermarket bmw wheels (MMR Wheels, position 2) These companys have hyperlinks underneath their descriptions. Anyone have any ideas why this happens or how it happens?
Technical SEO | | frnprz0 -
"Extremely high number of URLs" warning for robots.txt blocked pages
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system: http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search. User-agent: * Disallow: /trackingredirect However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed. If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
Technical SEO | | EhrenReilly0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
What does it mean by 'blocked by Meta Robot'? How do I fix this?
When i get my crawl diagnostics, I am getting a blocked by Meta Robot, which means that my page is not being indexed in the search engines... obviously this is a major issue for organic traffic!!! What does it actually mean, and how can i fix it?
Technical SEO | | rolls1230 -
File from godaddy.com
Hi, One of our client has received a file from godaddy.com where his site is hosted. Here is the message from the client- "i submitted my site for Search Engine Visibility,but they got some issue on the site need to be fixed. i tried myself could not fix it" The site in question is - http://allkindofessays.com/ Is there any problem with the site ? Contents of the file - bplist00Ó k 0_ WebSubframeArchives_ WebSubresources_ WebMainResource L x Ï Ö Ý ] ¨ ¯ ¼ Û 6 SÓ @ F¡ Ó / :¡ Ó )¡ Ò ¡ Ô _ WebResourceResponse_ WebResourceData_ WebResourceMIMEType^WebResourceURLO cbplist00Ô Z[X$versionX$objectsY$archiverT$top † ¯ "()0 12DEFGHIJKLMNOPTUU$nullÝ !R$6S$10R$2R$7R$3S$11R$8V$classR$4R$9R$0R$5R$1€ € € € € € € € Ó #$%& [NS.relativeWNS.base€ € € _ ¢http://tags.bluekai.com/site/2748?redir=http%3A%2F%2Fsegment-pixel.invitemedia.com%2Fset_partner_uid%3FpartnerID%3D84%26partnerUID%3D%24_BK_UUID%26sscs_active%3D1Ò*+,-Z$classnameX$classesUNSURL¢./UNSURLXNSObject#A´ þ¹ –5 ÈÓ 3456=WNS.keysZNS.objects€ ¦789:;<€ €€ € €€ ¦>?@ABC€ € € € € € \Content-TypeSP3PVServerTDate^Content-LengthYBK-ServerYimage/gif_ nCP="NOI DSP COR CUR ADMo DEVo PSAo PSDo OUR SAMo BUS UNI NAV", policyref="http://tags.bluekai.com/w3c/p3p.xml"_ Apache/2.2.3 (CentOS)_ Sat, 10 Sep 2011 20:23:21 GMTR62T87dfÒ*+QR_ NSMutableDictionary£QS/\NSDictionary >Ò*+VW_ NSHTTPURLResponse£XY/_ NSHTTPURLResponse]NSURLResponse_ NSKeyedArchiverÑ]_ WebResourceResponse€ # - 2 7 R X s v z } € ƒ ‡ Š ‘ ” — š ¢ ¤ ¦ ¨ ª ¬ ® ° ² ´ ¶ ¸ ¿ Ë Ó Õ × Ù ~ ƒ Ž — ¦ ¯ ¸ º Á É Ô Ö Ý ß á ã å ç é ð ò ô ö ø ú ü ( 2 < Å å è í ò 4 8 L Z l o … ^ ‡O >GIF89a ÿÿÿ!ÿ NETSCAPE2.0 !ù , L ;Yimage/gif_ ¢http://tags.bluekai.com/site/2748?redir=http%3A%2F%2Fsegment-pixel.invitemedia.com%2Fset_partner_uid%3FpartnerID%3D84%26partnerUID%3D%24_BK_UUID%26sscs_active%3D1Õ _ WebResourceTextEncodingName_ WebResourceFrameNameO 6
Technical SEO | | seoug_20050