Unsolved Crawling error emails
-
Recently we start having random error messages about crawling issue:
2024-08-30 edweek:Ok
2024-08-29 marketbrief:Err. advertise: Err, edweek:Err, topschooljobs:Ok
2024-08-23 edweek:Ok
2024-08-22 marketbrief:Err. advertise: Err, edweek:Err
2024-08-21 topschooljobs:Ok, edweek:Ok
2024-08-15 marketbrief:Ok. advertise:OK
2024-08-13 edweek:Ok
2024-08-12 marketbrief:Ok
2024-08-08 marketbrief:Ok, advertise:Ok
2024-08-03 edweek:Ok, topschooljobs:Ok
All for 2024-07 - are OkYesterday I set 2 more crawls for the same sites (edweek and marketbrief) and I get a morning email about original edweek site is ok (still have some problem but crawl occurs and all is fine) but for test crawl for the same site "EW Test" I just got error email.
Also I suppressed ALL email communications and frankly surprised by this email.Can you please check what is wrong with a crawler or stat collection or I don't know who produced the issues.
-
@DTashjian said in Crawling error emails:
Recently we start having random error messages about crawling issue:
2024-08-30 edweek:Ok
2024-08-29 marketbrief:Err. advertise: Err, edweek:Err, topschooljobs:Ok
2024-08-23 edweek:Ok
2024-08-22 marketbrief:Err. advertise: Err, edweek:Err
2024-08-21 topschooljobs:Ok, edweek:Ok
2024-08-15 marketbrief:Ok. advertise:OK
2024-08-13 edweek:Ok
2024-08-12 marketbrief:Ok
2024-08-08 marketbrief:Ok, advertise:Ok
2024-08-03 edweek:Ok, topschooljobs:Ok
All for 2024-07 - are OkYesterday I set 2 more crawls for the same sites (edweek and marketbrief) and I get a morning email about original edweek site is ok (still have some problem but crawl occurs and all is fine) but for test crawl for the same site "EW Test" I just got error email.
Also I suppressed ALL email communications and frankly surprised by this email.Can you please check what is wrong with a crawler or stat collection or I don't know who produced the issues.
@DTashjian said in Crawling error emails:
Recently we start having random error messages about crawling issue:
2024-08-30 edweek:Ok
2024-08-29 marketbrief:Err. advertise: Err, edweek:Err, topschooljobs:Ok
2024-08-23 edweek:Ok
2024-08-22 marketbrief:Err. advertise: Err, edweek:Err
2024-08-21 topschooljobs:Ok, edweek:Ok
2024-08-15 marketbrief:Ok. advertise:OK
2024-08-13 edweek:Ok
2024-08-12 marketbrief:Ok
2024-08-08 marketbrief:Ok, advertise:Ok
2024-08-03 edweek:Ok, topschooljobs:Ok
All for 2024-07 - are OkYesterday I set 2 more crawls for the same sites (edweek and marketbrief) and I get a morning email about original edweek site is ok (still have some problem but crawl occurs and all is fine) but for test crawl for the same site "EW Test" I just got error email.
Also I suppressed ALL email communications and frankly surprised by this email.Can you please check what is wrong with a crawler or stat collection or I don't know who produced the issues.
Hi,
I had a similar issue with my site and resolved it by first reviewing the specific error messages to understand the problem. I then verified that the new crawl configurations were correct and matched the settings of the working ones. Ensuring that the test environment was properly set up and permissions were correctly configured was also crucial. Additionally, I compared logs to identify any discrepancies and made sure the crawler software and libraries were up to date. I suggest trying these steps to address your issue. Let me know if you need further assistance!
-
@DTashjian
I had a similar issue with my site and resolved it by first reviewing the specific error messages to understand the problem. I then verified that the new crawl configurations were correct and matched the settings of the working ones. Ensuring that the test environment was properly set up and permissions were correctly configured was also crucial. Additionally, I compared logs to identify any discrepancies and made sure the crawler software and libraries were up to date. I suggest trying these steps to address your issue. Let me know if you need further assistance! -
Hi DTashjian,
I had a similar issue with my [site](link url) and resolved it by first reviewing the specific error messages to understand the problem. I then verified that the new crawl configurations were correct and matched the settings of the working ones. Ensuring that the test environment was properly set up and permissions were correctly configured was also crucial. Additionally, I compared logs to identify any discrepancies and made sure the crawler software and libraries were up to date. I suggest trying these steps to address your issue. Let me know if you need further assistance!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Crawler was not able to access the robots.txt
I'm trying to setup a campaign for jessicamoraninteriors.com and I keep getting messages that Moz can't crawl the site because it can't access the robots.txt. Not sure why, other crawlers don't seem to have a problem and I can access the robots.txt file from my browser. For some additional info, it's a SquareSpace site and my DNS is handled through Cloudflare. Here's the contents of my robots.txt file: # Squarespace Robots Txt User-agent: GPTBot User-agent: ChatGPT-User User-agent: CCBot User-agent: anthropic-ai User-agent: Google-Extended User-agent: FacebookBot User-agent: Claude-Web User-agent: cohere-ai User-agent: PerplexityBot User-agent: Applebot-Extended User-agent: AdsBot-Google User-agent: AdsBot-Google-Mobile User-agent: AdsBot-Google-Mobile-Apps User-agent: * Disallow: /config Disallow: /search Disallow: /account$ Disallow: /account/ Disallow: /commerce/digital-download/ Disallow: /api/ Allow: /api/ui-extensions/ Disallow: /static/ Disallow:/*?author=* Disallow:/*&author=* Disallow:/*?tag=* Disallow:/*&tag=* Disallow:/*?month=* Disallow:/*&month=* Disallow:/*?view=* Disallow:/*&view=* Disallow:/*?format=json Disallow:/*&format=json Disallow:/*?format=page-context Disallow:/*&format=page-context Disallow:/*?format=main-content Disallow:/*&format=main-content Disallow:/*?format=json-pretty Disallow:/*&format=json-pretty Disallow:/*?format=ical Disallow:/*&format=ical Disallow:/*?reversePaginate=* Disallow:/*&reversePaginate=* Any ideas?
Getting Started | | andrewrench0 -
Unsolved Moz crawler not working
Hi Moz crawler keep failing on my site with the error showing : Our crawler was banned by a page on your site, either through your robots.txt, the X-Robots-Tag HTTP header, or the meta robots tag. I'm not sure what am I missing out.. this is my robots.txt.. i don't think Im missing anything else.. https://www.wearefutureheads.com/robots.txt can the support team help ?
Moz Pro | | teikh0 -
Unsolved Rogerbot blocked by cloudflare and not display full user agent string.
Hi, We're trying to get MOZ to crawl our site, but when we Create Your Campaign we get the error:
Moz Pro | | BB_NPG
Ooops. Our crawlers are unable to access that URL - please check to make sure it is correct. If the issue persists, check out this article for further help. robot.txt is fine and we actually see cloudflare is blocking it with block fight mode. We've added in some rules to allow rogerbot but these seem to be getting ignored. If we use a robot.txt test tool (https://technicalseo.com/tools/robots-txt/) with rogerbot as the user agent this get through fine and we can see our rule has allowed it. When viewing the cloudflare activity log (attached) it seems the Create Your Campaign is trying to crawl the site with the user agent as simply set as rogerbot 1.2 but the robot.txt testing tool uses the full user agent string rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+shiny@moz.com) albeit it's version 1.0. So seems as if cloudflare doesn't like the simple user agent. So is it correct the when MOZ is trying to crawl the site it uses the simple string of just rogerbot 1.2 now ? Thanks
Ben Cloudflare activity log, showing differences in user agent strings
2022-07-01_13-05-59.png0 -
Crawl test
I used to use the crawl test tool to crawl websites and it presented the information in a really useful hierarchy of pages. The new on-demand crawl test doesn't seem to do this. Is there another tool I should be using to get the data?
Product Support | | Karen_Dauncey0 -
Crawl Issue
Hi, We have 3 campaigns running for our websites in different territories. All was going well until April 11th when Moz reported that our .com site (sendmode.com) could not be crawled. I get this error "Your page redirects or links to a page that is outside of the scope of your campaign settings ..." I've been through the site a number of times but have been unable to get to the root of the problem. Robots.txt and 301's look fine. Is there any way I can find out which page is causing the issue? John
Product Support | | johnmc330 -
Is is possible to revert reporting to a past crawl date?
Site Crawl report defaults to the last crawl. Is there a way to get data from a previous crawl for comparison?
Product Support | | JThibode1 -
My site crawl has been in progress since last week
Hi there, I've been waiting on my site crawl to complete since Friday (it's Tuesday now), but it still has the 'in progress' notification at the top. Is it normal for it to take over 3 days? Or is there something holding it up?
Product Support | | VAPartners0 -
Why has my Google Analytics dropped completely from the results on this weeks crawl?
The Account code hasn't changed in either Moz or on site? Also I added 53 new keywords to my campaign and they haven't all been ranked, it's like Moz has done half a job?! S.O.S.
Product Support | | danwebman0