Roger bot taking a long time to crawl site
-
Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?
thanks a lot, Mark.
-
Hi Peter
thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.
I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:
User-agent: * Disallow: /
I hadn't thought beyond this.
I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..
I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.
I know (well think) I have to get noindex, follow for 'sorted' category pages...
all the best, Mark.
-
Hi Mike
The crawl has now completed, thank you. I think the results will keep me occupied
all the best, Mark.
-
Hi Mark,
Sorry it's taking a while to crawl your new site.
While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:
# Crawlers Setup User-agent: * Crawl-delay: 30 # Allowable Index Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ Allow: /catalogsearch/result/ Allow: /media/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.
# Crawlers Setup User-agent: * Crawl-delay: 30 Disallow: / Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl. Thanks for reaching out! Best, Peter Li SEOmoz Help Team ```
-
Hi Mark,
This sounds like a bug or issue with the SEOmoz software.
Contact help@seomoz.org and ask one of the help associates to look into this for you.
If you do not have many pages, it definitely shouldn't take that long.
The help team responds extremely quickly!
Good luck.
Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Crawling with Firewall Plugin
Just wondering if anyone has any experience with the WordPress Simple Firewall plugin. I have a client who is concerned about security as they've had issues in that realm in the past and they've since installed this plugin: https://wordpress.org/support/view/plugin-reviews/wp-simple-firewall?filter=4 Problem is, even with a proper robots file and appropriate settings within the firewall, I still cannot crawl the site with site crawler tools. Google seems to be accessing the site fine, but I still wonder if it is in anyway potentially hindering search spiders.
Technical SEO | | BrandishJay0 -
Moving site between hosts over short time scale
Hi Does anyone know if moving your website between hosts & server location 3 times in a relatively short time frame (say 2 months) can have a negative impact from an seo perspective ? Cheers
Technical SEO | | Dan-Lawrence
Dan0 -
Site being indexed by Google before it has launched
We are currently coming towards the end of migrating one of our retail sites over to magento. To our horror, we find out today that some pages are already being indexed by Google, and we have started receiving orders through new site. Do you have any suggestions for what may have caused this? Or similarly, what the best solution would be to de-index ourselves? We most recently excluded anything with a certain parameter from robots.txt - could this being implemented incorrectly have caused this issue? Thanks
Technical SEO | | Sayers0 -
Why does it take so long for my TITLE tag it show ??
Hi Guys, I am frustrated here! I have changed my title tag so that I get a much better CTR. But I changed it a week ago and it is still howing the old title tag in the SERPS. Can anyone please tell me how long Google can take to show the new title tag in the search results?? Can it take weeks?? Thanks guys Gareth
Technical SEO | | GAZ090 -
Do we need to manually submit a sitemap every time, or can we host it on our site as /sitemap and Google will see & crawl it?
I realized we don't have a sitemap in place, so we're going to get one built. Once we do, I'll submit it manually to Google via Webmaster tools. However, we have a very dynamic site with content constantly being added. Will I need to keep manually re-submitting the sitemap to Google? Or could we have the continually updating sitemap live on our site at /sitemap and the crawlers will just pick it up from there? I noticed this is what SEOmoz does at http://www.seomoz.org/sitemap.
Technical SEO | | askotzko0 -
Seo on a dk site
hi my client has asked if we can seo their dk site , my question is does all link building and article submission have to be in danish
Technical SEO | | Westernoriental0 -
Reading Crawl Diagnostics and Taking Action on results
My site crawl diagnostics are showing a high number of duplicate page titles and content. When i look at the flagged pages, many errors are simply listed from multiple pages of product category search results. This looks pretty normal to me and I am at a loss for understanding how to fix this situation. Can I talk with someone? thanks, Gary
Technical SEO | | GaryQ0