Crawl Diagnostics 403 on home page...
-
In the crawl diagnostics it says oursite.com/ has a 403. doesn't say what's causing it but mentions no robots.txt. There is a robots.txt and I see no problems. How can I find out more information about this error?
-
Hi Dana,
Thanks for writing in. The robots.txt file would not cause a 403 error. That type of error is actually related to the way the server responds to our crawler. Basically, this means the server for the site is telling our crawler that we are not allowed to access the site. Here is a resource that explains the 403 http status code pretty thoroughly: http://pcsupport.about.com/od/findbyerrormessage/a/403error.htm
I looked at both of the campaigns on your account and I am not seeing a 403 error for either site, though I do see a couple of 404 page not found errors on one of the campaigns, which is a different issue.
If you are still seeing the 403 error message on one of your crawls, you would just need to have the webmaster update the server to allow rogerbot to access the site.
I hope this helps. Please let me know if you have any other questions.
-Chiaryn
-
Okay, so I couldn't find this thread and started a new one. Sorry...
... The problem persists.
RECAP
I have two blocks in my htaccess both are for amazonaws.com.
I have gone over our server block logs and see only amazon addresses and bot names.
I did a fetch as google with our WM Tools and fetch it did. Success!
Why isn't thiscrawler able to access? Many other bots are crawling right now.
Why can I use the seomoz on-page feature to crawl a single page but the automatic crawler wont access the site? Just took a break from typing this to try the on-page on our robots.txt, worked fine. Use the keyword "Disallow" and it gave me a C. =0)
... now if we could just crawl the rest of the site...
any help on this would be greatly appreciated.
-
I think I do. I just (a few minutes ago) went through a 403 problem being reported by another site trying access an html file for verification. Apparently they are connecting with an ip that's blocked by our htaccess. I removed the blocks told them to try again and it worked no problem. I see that SEOMoz has only crawled 1 page. Off to see if I can trigger a re-crawl now...
-
hmmm... not sure why this is happening. maybe add this line to the top of your robots.txt and see if it fixes by next week. it certainly won't hurt anything:
User-agent: * Allow: /
-
No problem. Looking at my Google WM Tools , crawl stats don't show any errors.
Thanks
User-Agent: *
Disallow: /*?zenid=
Disallow: /editors/
Disallow: /email/
Disallow: /googlecheckout/
Disallow: /includes/
Disallow: /js/
Disallow: /manuals/ -
OH this is only in SEOmoz's crawl diagnostics that you're seeing this error. That explains why robots.txt could be affecting it. I misread this earlier and thought you were finding the 403 on your own in-browser.
Can you paste the robots.txt file into here so we can see it? I would imagine that has everything to do with it now that I've correctly read your post --my apologies
-
apache
-
a 403 is a Forbidden code usually pertaining to Security and Permissions.
Are you running your server in an Apache or IIS environment? Robots.txt shouldn't affect a site's visibility to the public it only talks to site crawlers.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why my site not crawl?
hi all me allow rogerbot in robots.txt but rogerbot can't crawl my site my site: toska-co.ir
Moz Pro | | jahanidawodi0 -
Why would someone go to same 404 page over and over?
Good morning, I've been using the redirection plugin on my wordpress site and noticed i have multiple IP addresses going to the same folder on my site - like "mydomain.com/folder-name/". The "folder-name" is obviously not anything remotely like any folder or file name I have on my domain - so it's obviously spammy in nature. And, there are multiple IP addresses going to this same URL address every 3 hours on the dot, so it's appears automated. Is this something to be concerned about? Should I "do" anything? Thanks in advance for reading and replying!
Moz Pro | | mlm120 -
Pages to Optimise Issues
HI, Can someone tell me why 2 A pages are showing up in red in the F Grade page ? I see Page title missing from the output and just curious if I need to addrss some issue. xpOW4tY.jpg
Moz Pro | | ecrmeuro0 -
404 : Errors in crawl report - all pages are listed with index.html on a WordPress site
Hi Mozers, I have recently submitted a website using moz, which has pulled up a second version of every page on the WordPress site as a 404 error with index.html at the end of the URL. e.g Live page URL - http://www.autostemtechnology.com/applications/civil-blasting/ Report page URL - http://www.autostemtechnology.com/applications/civil-blasting/index.html The permalink structure is set as /%postname%/ For some reason the report has listed every page with index.html at the end of the page URL. I have tried a number of redirects in the .htaccess file but doesn't seem to work. Any suggestions will be strongly appreciated. Thanks
Moz Pro | | AmanziDigital0 -
Help with URL parameters in the SEOmoz crawl diagnostics Error report
The crawl diagnostics error report is showing tons of duplicate page titles for my pages that have filtering parameters. These parameters are blocked inside Google and Bing webmaster tools. I do I block them within the SEOmoz crawl diagnostics report?
Moz Pro | | SunshineNYC0 -
Crawl Diagnostics Update
I have corrected some errors in my SEOMoz Crawl Diagnostics, however the errors are still showing. It says a crawl has happen since. Any idea's why?
Moz Pro | | petewinter0 -
Twitter Page Authority Score?
I've been doing some competitive research in Open Site Explorer and many of our competitors have Twitter accounts very similar to ours. Their Twitter pages are usually one of the pages with linking to their website with the most Page Authority. The incoming links from Twitter are a "no follow" as you would guess. This has been the case for a large number of well ranking sites I have looked at. www.dremed.com also has a Twitter account at: https://twitter.com/#!/DREmed . However, Open Site Explorer does not list the Twitter link as an incoming link at all ( or if it does it has no Page Authority ). The Twitter account page seems very similar in nature to other competing Twitter pages. I'm not sure why it does not ALSO pull a high Page Authority score??? Do you know why this might be? Best, Justin
Moz Pro | | justinjeffries0