Roger bot taking a long time to crawl site
-
Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?
thanks a lot, Mark.
-
Hi Peter
thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.
I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:
User-agent: * Disallow: /
I hadn't thought beyond this.
I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..
I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.
I know (well think) I have to get noindex, follow for 'sorted' category pages...
all the best, Mark.
-
Hi Mike
The crawl has now completed, thank you. I think the results will keep me occupied
all the best, Mark.
-
Hi Mark,
Sorry it's taking a while to crawl your new site.
While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:
# Crawlers Setup User-agent: * Crawl-delay: 30 # Allowable Index Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ Allow: /catalogsearch/result/ Allow: /media/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.
# Crawlers Setup User-agent: * Crawl-delay: 30 Disallow: / Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl. Thanks for reaching out! Best, Peter Li SEOmoz Help Team ```
-
Hi Mark,
This sounds like a bug or issue with the SEOmoz software.
Contact help@seomoz.org and ask one of the help associates to look into this for you.
If you do not have many pages, it definitely shouldn't take that long.
The help team responds extremely quickly!
Good luck.
Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Making a site mobile friendly
Hey Mozzers, Im having a go at making our site mobile friendly without enlisting the help of developers and incorporating additional costs. I am ok with most of it as its just CSS work bar the odd occasion when i need to reposition some elements within the code. However, i have found myself wanting to use display:none {} on many elements that are just not practical on a mobile site. Some pages may have to hide substantial content. Would this be considered an issue or will google just see it as me hiding impractical elements for a different sized screen. I have googled this question for the past hour and there is a whole bunch of conflicting advice. As always, Many thanks
Technical SEO | | ATP0 -
Dev Site Was Indexed By Google
Two of our dev sites(subdomains) were indexed by Google. They have since been made private once we found the problem. Should we take another step to remove the subdomain through robots.txt or just let it ride out? From what I understand, to remove the subdomain from Google we would verify the subdomain on GWT, then give the subdomain it's own robots.txt and disallow everything. Any advice is welcome, I just wanted to discuss this before making a decision.
Technical SEO | | ntsupply0 -
How long does it take to reindex the website
Generally speaking, how long does it take for Google to recrawl/reindex an (ecommerce) website? After changing a number of product subcategories from 'noindex' back to 'index', I regenerated the sitemap and have fetched as Google in WMT. This was a couple of weeks ago and no action yet. Second question: Does Google treat these pages as if they're brand new? I 'noindexed' them back in April, and they were ranking ok then. (I had noindexed them on the back of advice from my SEO, due to concerns about these pages being seen as duplicate content). Help!
Technical SEO | | Coraltoes770 -
Configure a mobile site with WMT
Hello Everyone, I'm in a situation that I have no idea how to handle. I have only really dealt with RWD, and not a mobile-specific site. Anyway, I have a client who is launching an m.domian.com for their mobile site, how do I add/configure this in WMT? Thanks Zach
Technical SEO | | Zachary_Russell0 -
How long does it take for Google to index a new site and has anyone experienced serious fluctuations in SERP within 2 weeks after launch?
Hi guys, I have recently launched my ecommerce jewellery site - www.luxuryfinejewellery.com - and noticed some serious swings in SERP over the last couple of weeks. From ranking No 2, 3 and 4 for the keyword 'luxury fine jewellery' on Google.com, the homepage periodically disappears from the Top 50 altogether. I thought it was the Sandbox, as I recently purchased the domain name, within the last 6 weeks, however the fact that it does rank on the 1st page some of the time is a mystery. Has anyone also experienced this? Could you provide some advice on what to expect until the the rankings settle. Thanks in advance, Satbir
Technical SEO | | deluxebydesign0 -
Way to find how many sites within a given set link to a specific site?
Hi, Does anyone have an idea on how to determine how many sites within a list of 50 sites link to a specific site? Thanks!
Technical SEO | | SparkplugDigital0 -
Site Hosting Question
We are UK based web designers who have recently been asked to build a website for an Australian Charity. Normally we would host the website in the UK with our current hosting company, but as this is an Australian website with an .au domain I was wondering if it would be better to host it in Australia. If it is better to host it in Australia, I would appreciate if someone could give me the name of a reasonably priced hosting company. Thanks Fraser
Technical SEO | | fraserhannah0 -
Delete old site but redirect domain to a new domain and site
I just have a quick query and I have a feeling about what the answer is so just wanted to see what you guys thought... Basically I am working on a client site. This client has a few other websites that are divisions of their company. However these divisions/websites are no longer used. They are wanting to delete the websites but redirect the domains to their name main website. They believe this will pass on SEO benefits as these old division sites are old and have a good PR and history. I'm unsure for DEFINITE, which way is correct?
Technical SEO | | Weerdboil0