Roger bot taking a long time to crawl site

caterfor

Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?

thanks a lot, Mark.

caterfor

Hi Peter

thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.

I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:

User-agent: *
Disallow: /

I hadn't thought beyond this.

I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..

I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.

I know (well think) I have to get noindex, follow for 'sorted' category pages...

all the best, Mark.

caterfor

Hi Mike

The crawl has now completed, thank you. I think the results will keep me occupied

all the best, Mark.

Peterli

Hi Mark,

Sorry it's taking a while to crawl your new site.

While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:

# Crawlers Setup
User-agent: *
Crawl-delay: 30
# Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/

From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:

Allowable Index

Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/

While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.

# Crawlers Setup
User-agent: *
Crawl-delay: 30
Disallow: /
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/

From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl.

Thanks for reaching out!

Best,

Peter Li
SEOmoz Help Team
```

Mike.Goracke

Hi Mark,

This sounds like a bug or issue with the SEOmoz software.

Contact help@seomoz.org and ask one of the help associates to look into this for you.

If you do not have many pages, it definitely shouldn't take that long.

The help team responds extremely quickly!

Good luck.

Mike

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Roger bot taking a long time to crawl site

Allowable Index

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

What is a good crawl budget?

Moving site from html to Wordpress site: Should I port all old pages and redirect?

Crawl at a stand still

Canonical URLs in an eCommerce site

Why my site is not indexing in google

How to tell how often Google crawls someone else's site

How to find all the links to my site

Can I noindex most of my site?