Mozbot Can Not Crawl Entire Domain
-
I'm trying to crawl Redken.com in Moz Analytics and the Search Diagnostics is only crawling 4 pages. The domain uses a "select your country" the first time you visit, and it seems as though the bot is not getting beyond that (aka, not clicking on "USA") and is therefore not crawling the rest of the domain. There is no country specific URL other than redken.com.
I've tried entering both "redken.com" and "www.redken.com" as the URL, but no luck.
Any tips?
-
It's caused by the way you have build your site. If you click on redken.com - you get the choice of language. If you select "USA" you're redirected with 302 to redken.com/USA - then with 302 to redken.com/?country=USA then with 302 to redken.com I guess for browsers you store this somewhere (cookie?) - however for a simple bot (like Moz - but I have the same with Screaming Frog) - you just go back where you started = redken.com which again will start the same loop.
So - only 4 url's can be crawled. The other countries are on different url's so will not be included in the crawl.
Google bot is smarter and acts more like a real browser so will crawl the site - but Mozbot can't do that.
rgds
Dirk
Update - I actually forgot one redirect - redken.com first is redirected with 302 to redken.com/international
PS The site is horribly slow as well - and the redirect chain is certainly not helping.
-
Well, I just noticed that website is in flash! I believe non of crawl bots are able to crawl flash websites.
It seems that if I try to access redken.com it redirects me to flash version (/international).
Actually, now I can't recreate that. Super weird. Is there something "special" going on with automatic redirects? Look into that.
-
Thanks for the response!
These are the pages it crawled.
<colgroup><col width="420"></colgroup>
| http://redken.com |
| http://www.redken.com/ |
| http://www.redken.com/international/ |
| http://www.redken.com/USA |
| http://www.redken.com/?country=USA |Robots.txt looks clean, nothing that should have stopped it from crawling more.
-
Hi there.
Which pages are those 4 pages? Is your robots.txt blocking it for some reason maybe?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I’d like to set up a Moz campaign that crawls just the primary website, not subdomains
Is this something you could help with? (it either bombs or crawls everything, so I assume I’m missing something in the campaign settings, or it’s just not possible.
Getting Started | | blueprintatl0 -
Moz Site Crawl can't index WIX sites
We've been attempting to work on some SEO for a new potential client however they are using a WIX site. We've noticed that Moz SEO tools will not index any WIX sites. e.g. https://www.sharonradisch.com/ (which is one of their case studies). Anyone seen this that can offer any advice? Thanks,
Getting Started | | monkeex
Mark2 -
Why do ignored crawl issues still count as issues?
I use Cloudflare, so I can't avoid the Crawl Error for "Pages with no Meta Noindex" because of the way Cloudflare protects email addresses from harvesting (it creates a new page that has no meta noindex values). I marked this issue as "ignore" because there's nothing I can do about it, and it doesn't really affect my site's performance from an SEO standpoint. But even marked as ignore, it is still included in my site crawl issues count. Of course, I want to see that issues count drop to zero, but that can't happen if the ignored issues are counted. I don't want mark it fixed, because technically it's not fixed. KwPld
Getting Started | | troy.brophy0 -
Can't track my site, keep getting "Ooops. Our crawlers are unable to access that URL"
Hello, So i keep getting this message and I went to hurl.it and I get 200 response. But it appears its not my actual homepage bc it says the body is empty and in the title it says "COMING SOON" which is not what my actual homepage says. Does anyone know what this means?? Thank you in advance! Rena
Getting Started | | Palila-Studio0 -
Where can I find the list of all the question I've asked here?
In my profile I see only the comments, I'd like to keep track also of my questions
Getting Started | | 2mlab0 -
Can you set up a custom report for keywords with a label attached
I'm looking to set up a custom ranking report, but I just want the report to show up keywords that I have attached a certain label to. I've attached a custom label to a number of my keywords. Ideally I would like a report which just showed me the performance of these keywords. Does anybody know of a way to do this? Thanks in advance.
Getting Started | | Fergus_Macdonald0 -
Can not add another campaign
Im using the trial for moz however I can not add another campain. how can I add more campaign?
Getting Started | | deeplysense0 -
What are the solutions for Crawl Diagnostics?
Hi Mozers, I am pretty new to SEO and wanted to know what are the solutions for the various errors reported in the crawl diagnostics and if this question has been asked, please guide me in the right directions. Following are queries specific to my site just need help with these 2 only: 1. Error 404: (About 60 errors) : These are for all the PA 1 links and are no longer in the server, what do i do with these? 2. Duplicate Page Content and Title ( About 5000) : Most of these are automatic URL;s that are generated when someone fills any info on our website. What do I do with these URL;s. they are for example: _www.abc.fr/signup.php?_id=001 and then www.abc.fr/signup.php?id=002 and so on. What do I need to do and how? Plzz. Any help would be highly appreciated. I have read a lot on the forums about duplicate content but dont know how to implement this in my case, please advise. Thanks in advance. CY
Getting Started | | Abhi81870