Tool to Generate All the URLs on a Domain
-
Hi all,
I've been using xml-sitemaps.com for a while to generate a list of all the URLs that exist on a domain. However, this tool only works for websites with under 500 URLs on a domain. The paid tool doesn't offer what we are looking for either. I'm hoping someone can help with a recommendation.
We're looking for a tool that can:
- Crawl, and list, all the indexed URLs on a domain, including .pdf and .doc files (ideally in a .xls or .txt file)
- Crawl multiple domains with unlimited URLs (we have 5 websites with 500+ URLs on them)
Seems pretty simple, but we haven't been able to find something that isn't tailored toward management of a single domain or that can crawl a huge volume of content.
-
@PatrickDelehanty The tool mentioned in the statement not only excels in the two areas mentioned earlier but also offers a wide range of additional capabilities. I recommend that you explore it for yourself! Best of luck!
-
@PatrickDelehanty The tool mentioned in the statement not only excels in the two areas ```
mentioned -
It seems to crawl all the wordpress folders and media files.
Is there not a tool that will tell you just your live website URLs, I'm after creating a site map and a mass re-organising content exercise, so want a list in excel of URLs.Any tips welcome
Thanks
Sarah
-
2nd Vote for Screaming Frog. Tried a lot of tools to pull info on all the URL's and this tool is by far the best one for the job.
-
Hi Felicia
Try ScreamingFrog - they crawl the entire site (you can configure how you want it to crawl your site) and have ways of creating a XML Sitemap for you.
The tool goes above and beyond those two areas as well and can do so much. I suggest you check it out! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 Domain Redirect from old domain with HTTPS
My domain was indexed with HTTPS://WWW. now that we redirected it the certificate has been removed and if you try to visit the old site with https it throws an obvious error that this sites not secure and the 301 does not happen. My question is will googles bot have this issue. Right now the domain has been in redirection status to the new domain for a couple months and the old site is still indexed, while the new one is not ranking well for half its terms. If that is not causing the problem can anyone tell me why would the 301 take such a long time. Ive double and quadruple checked the 301's and all settings to ensure its being redirected properly. Yet it still hasn't fully redirected. Something is wrong and my clients ready to ditch the old domain we worked on for a good amount of time. backgorund:About 30 days ago we found some redirect loops .. well not loop but it was redirecting from old domain to the new domain several times without error. I removed the plugins causing the multi redirects and now we have just one redirect from any page on the old domain to the new https version. Any suggestions? This is really frustrating me and I just can't figure it out. My only answer at this point is wait it out because others have had this issue where it takes up to 2 months to redirect the domain. My only issue is that this is the first domain redirect out of many that have ever taken more than a week or three.
Technical SEO | | waqid0 -
We switched the domain from www.blog.domain.com to domain.com/blog.
We switched the domain from www.blog.domain.com to domain.com/blog. This was done with the purpose of gaining backlinks to our main website as well along with to our blog. This set us very low in organic traffic and not to mention, lost the backlinks. For anything, they are being redirected to 301 code. Kindly suggest changes to bring back all the traffic.
Technical SEO | | arun.negi0 -
Will this URL structure: "domain.com/s/content-title" cause problems?
Hey all, We have a new in-house built too for building content. The problem is it inserts a letter directly after the domain automatically. The content we build with these pages aren't all related, so we could end up with a bunch of urls like this: domain.com/s/some-calculator
Technical SEO | | joshuaboyd
domain.com/s/some-infographic
domain.com/s/some-long-form-blog-post
domain.com/s/some-product-page Could this cause any significant issues down the line?0 -
Old domain still being crawled despite 301s to new domain
Hi there, We switched from the domain X.com to Y.com in late 2013 and for the most part, the transition was successful. We were able to 301 most of our content over without too much trouble. But when when I do a site:X.com in Google, I still see about 6240 URLs of X listed. But if you click on a link, you get 301d to Y. Maybe Google has not re-crawled those X pages to know of the 301 to Y, right? The home page of X.com is shown in the site:X.com results. But if I look at the cached version, the cached description will say :This is Google's cache of Y.com. It is a snapshot of the page as it appeared on July 31, 2014." So, Google has freshly crawled the page. It does know of the 301 to Y and is showing that page's content. But the X.com home page still shows up on site:X.com. How is the domain for X showing rather than Y when even Google's cache is showing the page content and URL for Y? There are some other similar examples. For instance, you would see a deep URL for X, but just looking at the <title>in the SERP, you can see it has crawled the Y equivalent. Clicking on the link gives you a 301 to the Y equivalent. The cached version of the deep URL to X also shows the content of Y.</p> <p>Any suggestions on how to fix this or if it's a problem. I'm concerned that some SEO equity is still being sequestered in the old domain.</p> <p>Thanks,</p> <p>Stephen</p></title>
Technical SEO | | fernandoRiveraZ1 -
What is the advantage of using sub domains instead of pages on the root domain?
Have a look at this example http://bannerad.designcrowd.com/ For each category of design, they have a landing page on the sub domain. Wouldn't it be better to have them as part of the same domain? What is the strategy behind using sub domains?
Technical SEO | | designquotes0 -
Drupal URL Aliases vs 301 Redirects + Do URL Aliases create duplicates?
Hi all! I have just begun work on a Drupal site which heavily uses the URL Aliases feature. I fear that it is creating duplicate links. For example:: we have http://www.URL.com/index.php and http://www.URL.com/ In addition we are about to switch a lot of links and want to keep the search engine benefit. Am I right in thinking URL aliases change the URL, while leaving the old URL live and without creating search engine friendly redirects such as 301s? Thanks for any help! Christian
Technical SEO | | ChristianMKTG0 -
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like: staging.domain.com
Technical SEO | | fthead9
User-agent: *
Disallow: / in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.0 -
Domain Redirect Issues
Hi, I have a domain that is 10 years old, this is the old domain that used to be the website for the company. The company approximately 7 years ago was bought by another and purchased a new domain that is 7 years old. The company did not do a 301 redirect as they were not aware of the SEO implications. They continued building web applications on the old domain while using the new domain for all marketing and for business partner links. They just put in a server level redirect on the folders themselves to point to the new root. I am on Tomcat, I do not have the option of a 301 redirect as the web applications are all hard coded links (non-relative) (hundreds of thousands of dollars to recode) After beginning SEO; Google is seeing them as the same domain, and has replaced all results in Google with the old domain instead of the new one..... My questions is.... Is it better to take the hit and just put a robots.txt to disallow all robots on the old domain Or... Will that hurt my new domain as well since Google is seeing them as the same? Or.... Has Google already made the switch without a redirect to see these as the same and i should just continue on? (even the cache for the new site shows the old domain address) Old Domain= www.floridahealthcares.com New = www.fhcp.com *****Update after writing this I began changing index.htm to all non relative links so all links on the old domain homepage would point to fhcp.com fixing the issue of the entire site being replicated under the old domain. I think this might "Patch" my issue, but i would still love to get the opinion of others Thanks Shane
Technical SEO | | Jinx146780