Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to extract URLs from a site (without bringing the server down!)
-
Hi everybody.
One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.
However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.
Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!
-
Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):
http://www.screamingfrog.co.uk/seo-spider/
It's a good tool, and nice to have around, IMO.
-
Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?
-
why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv
-
Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?
-
-
Scrape Google?
-
Make your own scraper and keep the requests per second really low ?
-
Maybe the site has an automated sitemap somewhere ?
-
Google webmaster tools -> download "internal links" table
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Category URL Pagination where URLs don't change between pages
Hello, I am working on an e-commerce site where there are categories with multiple pages. In order to avoid pagination issues I was thinking of using rel=next and rel=prev and cannonical tags. I noticed a site where the URL doesn't change between pages, so whether you're on page 1,2, or 3 of the same category, the URL doesn't change. Would this be a cleaner way of dealing with pagination?
Technical SEO | | whiteonlySEO0 -
Inurl: search shows results without keyword in URL
Hi there, While doing some research on the indexation status of a client I ran into something unexpected. I have my hypothesis on what might be happing, but would like a second opinion on this. The query 'site:example.org inurl:index.php' returns about 18.000 results. However, when I hover my mouse of these results, no index.php shows up in the URL. So, Google seems to think these (then duplicate content) URLs still exist, but a 301 has changed the actual goal URL? A similar things happens for inurl:page. In fact, all the 'index.php' and 'page' parameters were removed over a year back, so there in fact shouldn't be any of those left in the index by now. The dates next to the search results are 2005, 2008, etc. (i.e. far before 2013). These dates accurately reflect the times these forums topic were created. Long story short: are these ~30.000 'phantom URLs' in the index out of total of ~100.000 indexed pages hurting the search rankings in some way? What do you suggest to get them out? Submitting a 100% coverage sitemap (just a few days back) doesn't seem to have any effect on these phantom results (yet).
Technical SEO | | Theo-NL0 -
Seo For Forum Sites
I have forum site.I've opened it 2 months ago.But there is a problem.Therefore my content is unique , my site's keyword ranking constantly changing..Sometimes my site's ranking drops from first 500.After came to 70s. I didn't make any off page seo to my site.What is the problem ?
Technical SEO | | tutarmi0 -
Mobile site ranking instead of/as well as desktop site in desktop SERPS
I have just noticed that the mobile version of my site is sometimes ranking in the desktop serps either instead of as well as the desktop site. It is not something that I have noticed in the past as it doesn't happen with the keywords that I track, which are highly competitive. It is happening for results that include our brand name, e.g '[brand name][search term]'. The mobile site is served with mobile optimised content from another URL. e.g wwww.domain.com/productpage redirects to m.domain.com/productpage for mobile. Sometimes I am only seen the mobile URL in the desktop SERPS, other times I am seeing both the desktop and mobile URL for the same product. My understanding is that the mobile URL should not be ranking at all in desktop SERPS, could we be being penalised for either bad redirects or duplicate content? Any ideas as to how I could further diagnose and solve the problem if you do believe that it could be harming rankings?
Technical SEO | | pugh0 -
Changing images on site without losing ranking
A number of images on my site rank very well under google image search but need to be replaced with updated versions. If I keep the file name and pixel dimensions identical will switching the image effect my rankings? Thanks!
Technical SEO | | Justin450 -
Special characters in URL
Hi There, We're in the process of changing our URL structure to be more SEO friendly. Right now I'm struggling to find a good way to handle slashes that are part of a targeted keyword. For example, if I have a product page and my product title is "1/2 ct Diamond Earrings in 14K Gold" which of the following URLs is the right way to go if I'm targeting the product title as the search keyword? example.com/jewelry/1-2-ct-diamond-earrings-in-14k-gold example.com/jewelry/12-ct-diamond-earrings-in-14k-gold example.com/jewelry/1_2-ct-diamond-earrings-in-14k-gold example.com/jewelry/1%2F2-ct-diamond-earrings-in-14k-gold Thanks!
Technical SEO | | Richline_Digital0 -
Links from the same server has value or not
Hi Guys, Sometime ago one of the SEO experts said to me if I get links from the same IP address, Google doesn't count them as with much value. For an example, I am a web devleoper and I host all my clients websites on one server and link them back to me. Im wondering whether those links have any value when it comes to seo or should I consider getting different hosting providers? Regards Uds
Technical SEO | | Uds0