Roger bot taking a long time to crawl site
-
Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?
thanks a lot, Mark.
-
Hi Peter
thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.
I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:
User-agent: * Disallow: /
I hadn't thought beyond this.
I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..
I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.
I know (well think) I have to get noindex, follow for 'sorted' category pages...
all the best, Mark.
-
Hi Mike
The crawl has now completed, thank you. I think the results will keep me occupied
all the best, Mark.
-
Hi Mark,
Sorry it's taking a while to crawl your new site.
While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:
# Crawlers Setup User-agent: * Crawl-delay: 30 # Allowable Index Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ Allow: /catalogsearch/result/ Allow: /media/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.
# Crawlers Setup User-agent: * Crawl-delay: 30 Disallow: / Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl. Thanks for reaching out! Best, Peter Li SEOmoz Help Team ```
-
Hi Mark,
This sounds like a bug or issue with the SEOmoz software.
Contact help@seomoz.org and ask one of the help associates to look into this for you.
If you do not have many pages, it definitely shouldn't take that long.
The help team responds extremely quickly!
Good luck.
Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content from Another Site
Hi there - I have a client that says they'll be "serving content by retrieving it from another URL using loadHTMLFile, performing some manipulations on it, and then pushing the result to the page using saveHTML()." Just wondering what the SEO implications of this will be. Will search engines be able to crawl the retrieved content? Is there a downside (I'm assuming we'll have some duplicate content issues)? Thanks for the help!!
Technical SEO | | NetStrategies1 -
Strange Crawl Report
Hey Moz Squad, So I have kind of strange case. My website locksmithplusinc.com has been around for a couple years. I have had all sorts of pages and blogs that have maybe ranked for a certain location a longtime ago and got deleted so I could speed up the site and consolidate my efforts. I said that because I think that might be part of the problem. When I was crawl reporting my site just three weeks ago on moz I had over 23 crawl report issues. Duplicate pages, missing meta tags the regular stuff. But now all of a sudden when I crawl report on MOZ it comes up with Zero issues. So I did another crawl On google analytic and this is what came up. SO im very confused because none of these url's are even url's on my site. So maybe people are searching for this stuff and clicking on broken links that are still indexed and getting this 404 error? What do you guys think? Thank you guys so much for taking a shot at this one. siS44ug
Technical SEO | | Meier0 -
Redirecting HTTP to HTTPS - How long does it take Google to re-index the site?
hello Moz We know that this year, Moz changed its domain to moz.com from www.seomoz.org
Technical SEO | | joony
however, when you type "site:seomoz.org" you still can find old urls indexed on Google (on page 7 and above) We also changed our site from http://www.example.com to https://www.example.com
And Google is indexing both sites even though we did proper 301 redirection via htaccess. How long would it take Google to refresh the index? We just don't worry about it? Say we redirected our entire site. What is going to happen to those websites that copied and pasted our content? We have already DMCAed their webpages, but making our site https would mean that their website is now more original than our site? Thus, Google assumes that we have copied their site? (Google is very slow on responding to our DMCA complaint) Thank you in advance for your reply.0 -
Roger has detected a problem:
_ Roger has detected a problem:_ We have detected that the root domain livefit.co.uk does not respond to web requests. Using this domain, we will be unable to crawl your site or present accurate SERP information. Can anyone see why? Cheers Stephen
Technical SEO | | firstconversion0 -
How long does it take for Google to index a new site and has anyone experienced serious fluctuations in SERP within 2 weeks after launch?
Hi guys, I have recently launched my ecommerce jewellery site - www.luxuryfinejewellery.com - and noticed some serious swings in SERP over the last couple of weeks. From ranking No 2, 3 and 4 for the keyword 'luxury fine jewellery' on Google.com, the homepage periodically disappears from the Top 50 altogether. I thought it was the Sandbox, as I recently purchased the domain name, within the last 6 weeks, however the fact that it does rank on the 1st page some of the time is a mystery. Has anyone also experienced this? Could you provide some advice on what to expect until the the rankings settle. Thanks in advance, Satbir
Technical SEO | | deluxebydesign0 -
See your sites Architecture
Does anybody know a problem where you can see how your internal linkings look to the search engines?
Technical SEO | | ScottBaxterWW0 -
Which is more accurate? site: or GWT?
when viewing urls in google's index, is it more accurate to refer to site:www.domain.com or google webmaster tools (urls in web index)?
Technical SEO | | nicole.healthline0