Tools that crawl 2 million page sites
-
Our site is about 2million pages deep, 50% of which is stale content. Yes, I know - OMG #unhygienic. Even if we get approval to get rid of half of it. SEOMoz Pro Elite only crawls 20k deep - what can i do to crawl and diagnose the whole site. Are there any tools anyone can suggest. SEOMoz??
-
That's good to know. It sounds like that's probably the best way. I also use Screaming Frog (http://www.screamingfrog.co.uk/seo-spider/) to try and crawl sites and with dedicated 2Gigs of ram, it's able to crawl around 50k pages. If your site is structured in sub-folders, you might be able to break it into parts and then crawl. But then if not, the SEOMOZ Enterprise looks like the way to go.
-
There is an enterprise version of SEOmoz which will do 1 million pages a month and up to 30k keywords which is well worth looking into if you have a enormous web property.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Compare sites?
I'm frustrated, so want to ask a stupid question....My site.. www.seadwellers.com outranks my biggest competitor in most Moz catagories... www.rainbowreef.us ...EXCEPT Facebook likes...(he has a ton) **And yet, rainbowreef.us outranks me in most keywords on
Moz Pro | | sdwellers
Google?! I know it's not simple...but Can anyone take a quick peek and give me any insight as to why??? ** Example "Dive Key Largo" keyword...he is #1 and I am #5...typical in the most important keywords!0 -
Duplicate Pages
Hello, we have an issue which I'm hoping someone can help with. Our Moz system is saying that this page http://www.indigolittle.com/fees/ Is a duplicate page. We use this page purely for mobiles and we have added code to say This has been on for over a month now however Moz is still picking the page us as a High Priority Issue.
Moz Pro | | popcreativeltd0 -
1 page crawled ... and other errors
1. Why is only one (1) page crawled every second time you crawl my site? 2. Why do your bot not obey the rules specified in the robots.txt? 3. Why does your site constantly loose connection to my facebook account/page? This means that when ever i want to compare performance i need to re-authorize, and therefor can not see any data until next time. Next time i also need to re-authorize ... 4. Why cant i add a competitor twitter account? What ever i type i get an "uh oh account cannot be tracked" - and if i randomly succeed, the account added never shows up with any data. It has been like this for ages. If have reported these issues over and over again. We are part of a large scandinavian company represented by Denmark, Sweden, Norway and Finland. The companies are also part of a larger worldwide company spreading across England, Ireland, Continental Europe and Northern Europe. I count at least 10 accounts on Seomoz.org We, the Northern Europe (4 accounts) are now reconsidering our membership at seomoz.org. We have recently expanded our efforts and established a SEO-community in the larger scale businees spanning all our countries. Also in this community we are now discussing the quality of your services. We'll be meeting next time at 27-28th of june in London. I hope i can bring some answers that clarify the problem we have seen here on seomoz.org. As i have written before: I love your setup and you tools - when they work. Regretebly, that is only occasionally the case!
Moz Pro | | alsvik1 -
Open Site Explorer results vs. Google Webmaster Tools results
I've been comparing the links to my domain that OSE and GWT show and GWT shows many more links than OSE. Can anyone explain the difference? Does Google report no follow links that OSE does not?
Moz Pro | | cartersnipes0 -
Websites First Crawl - Over 2 Hour Suggested Wait
Hello SEOMoz! We recently signed up for a free trial and on the pro dashboard it states the following. "To get you started quickly Roger is crawling up to 250 pages on your site. You should see these results within two hours. The full crawl will complete within 7 days." It's been nearly 24 hours and we see no results under Crawl Diagnostics however we do under rankings. Is this normal? Thanks
Moz Pro | | hostsurfuk0 -
Duplicate page title
I own a store www.mzube.co.uk and the scam always says that I have duplicate page titles or duplicate page. What happens is thn I may have for example www.mzube.co.uk/allproducts/page1. And if I hve 20 pages all what will change from each page is the number at the end and all the rest of the page name will be the same but really the pages are if different products. So the scans think I have 20 pages the same but I havent Is this a concern as I don't think I can avoid this Hope you can answer
Moz Pro | | mzube0 -
Crawl Diagnostics finding pages that dont exist. Will Rel Canon Help?
I have recently set up a campaign for www.completeoffice.co.uk. Im the in-house developer there. When the crawl diagnostics completed, i went to check the results, and to my surprise, it had well over 100 missing or empty title tags. I then clicked it to see what pages, and nearly all the pages it say have missing or empty title tags, DO NOT EXIST. This has really confused me and need help figuring out how to solve this. Can anyone help? Attached image is a screen shot of some of the links it showed me on crawl diagnostics, nearly all of these do not exist. Will the relation Canonical tag in the head section of the actual pages help? For example, The actual page that exist is: www.completeoffice.co.uk/Products.php Whereas, when crawled it actually showed www.completeoffice.co.uk/Products/Products.php Will have the rel can tag in the header of the real products.php solve this?
Moz Pro | | CompleteOffice0 -
Why are these pages considered duplicate page content?
A recent crawl diagnostic for a client's website had several new duplicate page content errors. The problem is, I'm not sure where the error comes from since the content in the webpage is different from one another. Here's the pages that SEOMOZ reported to have duplicate page content errors: http://www.imaginet.com.ph/wireless-internet-service-providers-term http://www.imaginet.com.ph/antivirus-term http://www.imaginet.com.ph/berkeley-internet-name-domain http://www.imaginet.com.ph/customer-premises-equipment-term The only thing similar that I see is the headline which says "Glossary Terms Used in this Site" - I hope that the one sentence is the reason for the error. Any input is appreciated as I want to find out the best solution for my client's website errors. Thanks!
Moz Pro | | TheNorthernOffice790