Tools that crawl 2 million page sites
-
Our site is about 2million pages deep, 50% of which is stale content. Yes, I know - OMG #unhygienic. Even if we get approval to get rid of half of it. SEOMoz Pro Elite only crawls 20k deep - what can i do to crawl and diagnose the whole site. Are there any tools anyone can suggest. SEOMoz??
-
That's good to know. It sounds like that's probably the best way. I also use Screaming Frog (http://www.screamingfrog.co.uk/seo-spider/) to try and crawl sites and with dedicated 2Gigs of ram, it's able to crawl around 50k pages. If your site is structured in sub-folders, you might be able to break it into parts and then crawl. But then if not, the SEOMOZ Enterprise looks like the way to go.
-
There is an enterprise version of SEOmoz which will do 1 million pages a month and up to 30k keywords which is well worth looking into if you have a enormous web property.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Page
I just Check Crawl the status error with Duplicate Page Content. As Mentioned Below. Songs.pk | Download free mp3, Hindi Music, Indian Mp3 Songs http://www.getmp3songspk.com Songs.pk | Download free mp3, Hindi Music, Indian Mp3 Songs http://getmp3songspk.com and then i added these lines to my htaccess file RewriteBase /
Moz Pro | | Getmp3songspk
RewriteCond %{HTTP_HOST} !^www.getmp3songspk.com$ [NC]
RewriteRule ^(.*)$ http://www.getmp3songspk.com/$1 [L,R=301] But Still See that error again when i crawl a new test.0 -
What to do with a site of >50,000 pages vs. crawl limit?
What happens if you have a site in your Moz Pro campaign that has more than 50,000 pages? Would it be better to choose a sub-folder of the site to get a thorough look at that sub-folder? I have a few different large government websites that I'm tracking to see how they are fairing in rankings and SEO. They are not my own websites. I want to see how these agencies are doing compared to what the public searches for on technical topics and social issues that the agencies manage. I'm an academic looking at science communication. I am in the process of re-setting up my campaigns to get better data than I have been getting -- I am a newbie to SEO and the campaigns I slapped together a few months ago need to be set up better, such as all on the same day, making sure I've set it to include www or not for what ranks, refining my keywords, etc. I am stumped on what to do about the agency websites being really huge, and what all the options are to get good data in light of the 50,000 page crawl limit. Here is an example of what I mean: To see how EPA is doing in searches related to air quality, ideally I'd track all of EPA's web presence. www.epa.gov has 560,000 pages -- if I put in www.epa.gov for a campaign, what happens with the site having so many more pages than the 50,000 crawl limit? What do I miss out on? Can I "trust" what I get? www.epa.gov/air has only 1450 pages, so if I choose this for what I track in a campaign, the crawl will cover that subfolder completely, and I am getting a complete picture of this air-focused sub-folder ... but (1) I'll miss out on air-related pages in other sub-folders of www.epa.gov, and (2) it seems like I have so much of the 50,000-page crawl limit that I'm not using and could be using. (However, maybe that's not quite true - I'd also be tracking other sites as competitors - e.g. non-profits that advocate in air quality, industry air quality sites - and maybe those competitors count towards the 50,000-page crawl limit and would get me up to the limit? How do the competitors you choose figure into the crawl limit?) Any opinions on which I should do in general on this kind of situation? The small sub-folder vs. the full humongous site vs. is there some other way to go here that I'm not thinking of?
Moz Pro | | scienceisrad0 -
Your site's pages may be using techniques that are outside Google's Webmaster Guidelines
Hi All The message below I received from google webmaster, please tell me how I solve this problem Dear site owner or webmaster of http://testedfatburners.com/, We've detected that some of your site's pages may be using techniques that are outside Google's Webmaster Guidelines. If you have any questions about how to resolve this issue, please see ourWebmaster Help Forum for support. Sincerely, Google Search Quality Team
Moz Pro | | mkm1040 -
Crawl Diagnostics
My site was crawled last night and found 10,000 errors due to a Robot.txt change implemented last week in between Moz crawls. This is obviously very bad so we have corrected it this morning. We do not want to wait until next Monday (6 days) to see if the fix has worked. How do we force a Moz crawl now? Thanks
Moz Pro | | Studio330 -
Opens Site Explorer sees two pages that are one
www.mysite.com/ and www.mysite.com/default.asp are both showing up in open site explorer. How do I make sure that these are not being interpreted as two pages when they are the same page? (Both in search engines and OSE.)
Moz Pro | | EugeneF0 -
Crawl Diagnostics
Hello, I would appreciate your help on the following issue. During Crawl procedure of e-maximos.com (WP installation) I get a lot of errors of the below mentioned categories: Title Missing or Empty & Missing Meta Description Tag for the URLs: http://e-maximos.com/?like_it=xxxx (i.e. xxxx=1033) Any idea of the reason and possible solution. Thank you in advance George
Moz Pro | | gpapatheodorou0 -
Bulk OSE Open Site Explorer Tool?
I am trying to do some spring cleaning for a client and hoping to prune any unnecessary domains. Is there a tool that will check, in bulk, these domains through Open Site Explorer? I've looked through all the different Excel spread sheet apps and google doc apps but they are incredibly buggy if they work at all since SEOmoz changed their data limits. Maybe a new tool has been updated in the last few months that I am not aware of. Thanks!
Moz Pro | | kerplow0 -
Crawl Diagnostics Summary
Is there a way to view the charts in the crawl diagnostics summary on a monthly view (or export the monthly figures)?
Moz Pro | | RikkiD220