Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does anyone know the linking of hashtags on Wix sites does it negatively or postively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please?
Does anyone know the linking of hashtags on Wix sites does it negatively or positively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please? For example at the bottom of this blog post https://www.poppyandperle.com/post/face-painting-a-global-language the hashtags are linked, but they don't go to a page, they go to search results of all other blogs using that hashtag. Seems a bit of a strange approach to me.
Technical SEO | | Mediaholix0 -
Help Setting Up 301 Redirects from Coldfusion Site to Wordpress Site.
I have created a new website and need to redirect all of the previous pages to the new one. The old website was built in coldfusion and the new site is built in wordpress. One of the pages I'm trying to redirect is www.norriseal.com/products.cfm to http://norrisealwellmark.com/products/. This is what I have in my .htaccess file <ifmodule mod_rewrite.c="">Options +FollowSymlinks
Technical SEO | | MarketHubb
RewriteEngine On
RewriteBase /
Redirect 301 /products.cfm http://norrisealwellmark.com/products/</ifmodule> The result of this redirect is http://norrisealwellmark.com/products.cfm How do I prevent the .cfm from appending to the destination URL?1 -
Tools/Software that can crawl all image URLs in a site
Excluding Screaming Frog, what other tools/software to use in order to crawl all image URLs in a site? Because in Screaming Frog, they don't crawl image URLs which are not under the site domain. Example of an image URL outside the client site: http://cdn.shopify.com/images/this-is-just-a-sample.png If the client is: http://www.example.com, Screaming Frog only crawls images under it like, http://www.example.com/images/this-is-just-a-sample.png
Technical SEO | | jayoliverwright0 -
Special characters in URL
Will registered trademark symbol within a URL be bad? I know some special characters are unsafe (#, >, etc.) but can not find anything that mentions registered trademark. Thanks!
Technical SEO | | bonnierSEO0 -
Links from the same server has value or not
Hi Guys, Sometime ago one of the SEO experts said to me if I get links from the same IP address, Google doesn't count them as with much value. For an example, I am a web devleoper and I host all my clients websites on one server and link them back to me. Im wondering whether those links have any value when it comes to seo or should I consider getting different hosting providers? Regards Uds
Technical SEO | | Uds0 -
Old URL redirect to New URL
Alright I did something dumb a year a go and I'm still paying for it. I changed my hyphenated URL to the non-hyphenated version when I redesigned my website. I say it was dumb because I lost most of my link juice even though I did 301 redirects (via the htaccess file) for almost all of the pages I could find in Google's index. Here's my problem. My new site took a huge hit in traffic (down 60%) when I made the change and even though I've done thousands of redirects my old site is still showing up in the SERPS and send much if not most of my traffic. I don't want to take the old site down in fear it will kill all of my traffic. What should I do? Is there a better method I should explore then 301 redirects? Could the other site be affecting my current rank since it's still there? (FYI...both sites are built on the WP platform). Any help or ideas are greatly appreciated. Thank you! Joe
Technical SEO | | kaje0 -
Drupal URL Aliases vs 301 Redirects + Do URL Aliases create duplicates?
Hi all! I have just begun work on a Drupal site which heavily uses the URL Aliases feature. I fear that it is creating duplicate links. For example:: we have http://www.URL.com/index.php and http://www.URL.com/ In addition we are about to switch a lot of links and want to keep the search engine benefit. Am I right in thinking URL aliases change the URL, while leaving the old URL live and without creating search engine friendly redirects such as 301s? Thanks for any help! Christian
Technical SEO | | ChristianMKTG0 -
What are the pros and cons of moving one site onto a subdomain of another site?
Two sites. One has weaker sales. What would the benefits and problems for SEO of moving the weak site from its own domain to a subdomain of the stronger site?
Technical SEO | | GriffinHansen0