How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Redirect
Hi All, So we have employees who can own their own domains for business, however, one employee has a domain that links back to our main site, but when it does, the URL and Page title of our main site, still say his own domain. IE: www.johndoe.com links to www.mysite.com except the url and itle still say www.johndoe.com What are the implications of this? Thank you
Technical SEO | | PeteEllard0 -
Can anyone tell me why some of the top referrers to my site are porn site?
We noticed today that 4 of the top referring sites are actually porn sites. Does anyone know what that is all about? Thanks!
Technical SEO | | thinkcreativegroup1 -
Mobile site not ranking
Hello, Our main site ranks well for all the keyword terms, and yet, our mobile site is buried. It is a "m." configuration, and I am wondering if it is a question of not using the correct programming language to get it there? Or if the redirects to the main site should relate differently? I have tried to read up on the topic of mobile site SEO and cannot find (or understand) the answer? Could someone please help? Thanks so much in advance!
Technical SEO | | lfrazer0 -
Why does my mobile site have a "?mobiRedirect=1" string at the end of the URL?
Hello, When trying to access my site from a smart-phone, I'm getting a redirected to the mobile version (which is correct), however at the end of the URL there is a redirect string that shows every time. I'm not sure why its its showing or how it automatically gets appended to the end of the URL each time. How can I configure my mobile site to prevent the ?mobiRedirect=1" from showing? For example, if you search for "Columbus Regional Health" on Google with a smart-phone, the first result should be for www.crh.org. If you click that, you should get redirected to www.crh.org/mobile , however its displaying the URL as http://www.crh.org/mobile/default.aspx?mobiRedirect=1 Does anyone know how to fix this? Thank you,
Technical SEO | | Liamis
Brian0 -
I have altered a url as it was too long. Do I need to do a 301 redirect for the old url?
Crawl diagnostics has shown a url that is too long on one of our sites. I have altered it to make it shorter. Do I now need to do a 301 redirect from the old url? I have altered a url previously and the old url now goes to the home page - can't understand why. Anyone know what is best practice here? Thanks
Technical SEO | | kingwheelie0 -
Where to place your brandname in your URL?
Hello everybody! Quick and short question: What is better when you want to rank for your your brandname? www.jobsbrandname.com or www.brandnamejobs.com I think for SEO it's better to use the last one but marketing has the wish to use the first one. Thanks for your responce!
Technical SEO | | ltom0 -
Trailing Slashes In Url use Canonical Url or 301 Redirect?
I was thinking of using 301 redirects for trailing slahes to no trailing slashes for my urls. EG: www.url.com/page1/ 301 redirect to www.url.com/page1 Already got a redirect for non-www to www already. Just wondering in my case would it be best to continue using htacces for the trailing slash redirect or just go with Canonical URLs?
Technical SEO | | upick-1623910 -
.CA site same as .com site - are both necessary?
Dear Friend, We representa a major national brand in the auto care industry, and they have locations in both US and Canada. There is a primary content site at .com that we have duplicated at .ca. We are hosting the .ca site on a separate IP on a server in Canada - but by in large it is the same site. (there are some minor changes we made to change US English to Canadian English - though minor. When we search Google.ca we generally see strong search results for the .com site, but rarely, if ever any evidence of rankings for the .ca site. The .com site was launched several years ago about 18 months before the .ca site. Why doesn't Google.ca show the .ca site? Is this an issue of duplicate content, and Google.ca simply shows the .com version which it knew about first? Are we wasting our time, money and efforts having both? Thanks, Tim ps. this isn't about location. We use a separate site to locate local shops, and have coordinated that well with Google Places, and when looking for local auto care - we do well in both US and Canada. The sites described above are largetl content sites.
Technical SEO | | lunavista-comm0