What is the best tool to crawl a site with millions of pages?
-
I want to crawl a site that has so many pages that Xenu and Screaming Frog keep crashing at some point after 200,000 pages.
What tools will allow me to crawl a site with millions of pages without crashing?
-
Don't forget to exclude pages that don't contain the information you are looking for - exclude query parameters which just result in duplicate content, system files, etc. That may help to bring the amount down.
-
Only basic stuff: URL, Title, Description, and a few HTML elements.
I am aware that building a crawler would be fairly easy, but is there one out there that already does it without consuming too many resources?
-
For what purpose do you want to crawl the site?
A web crawler isn't really hard to write. In 100 lines of code you can probably code one. The question is of course: what do you want out of the crawl?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Will google be able to crawl all of the pages given that the pages displayed or the info on a page varies according to the city of a user?
So the website I am working for asks for a location before displaying the product pages. There are two cities with multiple warehouses. Based on the users' location, the product pages available in the warehouse serving only in that area are shown. If the user skips location, default warehouse-related product pages are shown. The APIs are all location-based.
Intermediate & Advanced SEO | | Airlift0 -
What is the best strategy for linking to sub category pages?
My site is set up like this (i have x6 categories and all are similar) Home Page - Category - sub category - X4 detail pages My category page provides a summary/introduction of the subject, my sub category page is the "money page" with ability to quote & buy - my detail pages provide supporting material. What is the best internal linking strategy between these pages? (in addition, in one category i have x6 sub categories but only one of them is a "money page", should i be linking all of these pages back to the money page?) Thanks Ash
Intermediate & Advanced SEO | | AshShep10 -
When removing a product page from an ecommerce site?
What is the best practice for removing a product page from an Ecommerce site? If a 301 is not available and the page is already crawled by the search engine A. block it out in the robot.txt B. let it 404
Intermediate & Advanced SEO | | Bryan_Loconto0 -
Migrating a site from a standalone site to a subdivision of large .gov.uk site
The scenario We’ve been asked by a client, a Non-Government Organisation who are being absorbed by a larger government ministry, for help with the SEO of their site. They will be going from a reasonably large standalone site to a small sub-directory on a high authority government site and they want some input on how best to maintain their rankings. They will be going from the Number 1 ranked site in their niche (current site domainRank 59) to being a sub directory on a domainRank 100 site). The current site will remain, but as a members only resource, behind a paywall. I’ve been checking to see the impact that it had on a related site, but that one has put a catch all 302 redirect on it’s pages so is losing the benefit of a it’s historical authority. My thoughts Robust 301 redirect set up to pass as much benefit as possible to the new pages. Focus on rewriting content to promote most effective keywords – would suggest testing of titles, meta descriptions etc but not sure how often they will be able to edit the new site. ‘We have moved’ messaging going out to webmasters of existing linking sites to try to encourage as much revision of linking as possible. Development of link-bait to try and get the new pages seen. Am I going about this the right way? Thanks in advance. Phil
Intermediate & Advanced SEO | | smrs-digital0 -
How to increase the page rank for keyword for entire site
sorry for my bad english is there any way to increase the ranking for a keyword for the entire site .i know that seo is done per page basis .my site contains 1000ds of posts and i cant get back links for each and every post .so i picked 4 keywords which are mostly used while searching my products , is there any method i can increase my ranking for those keywords like increasing domain authority EXAMPLE :like if i want to increase my ranking for "buy laptop" .if any user searches In google with buy laptop i want my site or any of related pages that match the user search query must show up in front
Intermediate & Advanced SEO | | prakash.moturu0 -
Best practice to change the URL of all my site pages
Hi, I need to change all my site pages URL as a result of moving the site into another CMS platform that has its own URL structure: Currently the site is highly ranked for all relevant KWs I am targeting. All pages have backlinks Content and meta data should remain exactly the same. The domain should stay the same The plan is as follow: Set up the new site using a temporary domain name Copy over all content and meta data Set up all redirects (301) Update the domain name and point the live domain to the new one Watch closely for 404 errors and add any missing redirects Questions: Any comments on the plan? Is there a way (the above plan or any other) to make sure ranking will not be hurt What entries should I add to the sitemap.xml: new pages only or new pages and the pages from the old site? Thanks, Guy.
Intermediate & Advanced SEO | | jid1 -
Should I use the main keyword in the title tag for the site on all category pages?
I am pretty excited about changing all my title tags (for the most important 7 pages) since I have seen my rankings jump up in the SERP just by adding the main keyword for my website in the title tag. To make it easier I will explain my business. Simply, I run an online jewelry shop, so basically the keywords I want to use is "Jewelry online" and for the main categories "Necklace", "Rings" and "Bracelets". What I am unsure about is whether to use all the keywords in the main pages title tag or should I just use the main keyword "Jewelry online". I don’t want to create competition between my own pages of course. Jewelry Online - Trendy Fashion Jewelry | Homepage Or Jewelry Online - Necklace, Rings, Bracelets | Homepage And the same goes for the main categories, should I include "jewelry online" or not, like: Bracelets - Fashion Jewelry Online | Homepage Or Bracelets - Trendy_ Bangles_ and Arm Cuffs | Homepage Any suggestions what is the best practice for the title tag on main page and the main categories? Thanks
Intermediate & Advanced SEO | | ikomorin0