Site Spider/ Crawler/ Scraper Software
-
Short of coding up your own web crawler - does anyone know/ have any experience with a good bit of software to run through all the pages on a single domain?
(And potentially on linked domains 1 hop away...)
This could be either server or desktop based.
Useful capabilities would include:
- Scraping (x-path parameters)
-
of clicks from homepage (site architecture)
- http headers
- Multi threading
- Use of proxies
- Robots.txt compliance option
- csv output
- Anything else you can think of...
Perhaps an oppourtunity for an additional SEOmoz tool here since they do it already!
Cheers!
Note:
I've had a look at:- Nutch
http://nutch.apache.org/ - Heritrix
https://webarchive.jira.com/wiki/display/Heritrix/Heritrix - Scrapy
http://doc.scrapy.org/en/latest/intro/overview.html - Mozenda (does scraping but doesn't appear extensible..)
Any experience/ preferences with these or others?
-
Hey Alex,
Screaming Frog is hands down the best desktop crawling software and it has most of what you are looking for.
-Mike
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Targeting/Optimising for US English in addition to British English (hreflang tags)
Hi, I wonder if anyone can help? We have an e-commerce website based in the UK. We sell to customers worldwide. After the UK, the US is our second biggest market. We are English language only (written in British English), we do not have any geo-targeted language versions of our website. However, we are successful in selling to customers around the world on a regular basis. We have developers working on a new site due to launch in Winter 2021. This will include a properly managed site migration from our .net to a .com domain and associated redirects etc. Management are keen to increase sales / conversions to the US before the new site launches. They have requested that we create a US optimised version of the site. Maintaining broadly the same content, but dynamically replacing keywords: Example (clothing is not really what we sell): Replacing references to “trainers” with “sneakers”
International SEO | | IronBeetle
Replacing references ‘jumpers with “sweaters”
Replacing UK phone number with a US phone number It seems the wrong time to implement a major overhaul of URL structure, considering the planned migration from .net to .com in the not too distant future. For example I’m not keen to move British English content on to https://www.example.com/en-gb Would this be a viable solution: 1. hreflang non-us visitors directed to the existing URL structure (including en-gb customers): https://www.example.com/
2. hreflang US Language version of the site: https://www.example.com/en-us/ As the UK is our biggest market It is really important that we don’t negatively affect sales. We have extremely good visibility in SERPS for a wide range of high value/well converting keywords. In terms of hreflang tags would something like this work? Do we need need to make reference to en-gb being on https://www.example.com/ ? This seems a bit of a ‘half-way-house’. I recognise that there are also issues around the URL structure, which is optimised for British English/international English keywords rather than US English e.g. https://www.example.com/clothing/trainers Vs. https://example.com/clothing/sneakers Any advice / insight / guidance would be welcome. Thanks.0 -
Can you target the same site with multiple country HREFlang entries?
Hi, I have a question regarding the country targeting aspect of HREFLANG. Can the same site be targeted with multiple country HREFlang entries? Example: A global company has an English South African site (geotargeted in webmaster tools to South Africa), with a hreflang entry targeted to "en-za", to signify English language and South Africa as the country. Could you add entries to the same site to target other English speaking South African countries? Entries would look something like this: (cd = Congo, a completely random example) etc... Since you can only geo-target a site to one country in WMT would this be a viable option? Thanks in advance for any help! Vince
International SEO | | SimonByrneIFS0 -
License Details across multiple regional brand sites
Hi guys! I have a quick question. Our team are currently having a debate regarding whether we should display our licensing details as text across all our brands in multiple regions (roughly 50 sites). My argument is that if you are required to have a license to be able to operate legally that Google would EXPECT to be able to crawl those details in order to provide their (Google) users with reliable results as opposed to rogue operators. The other side of the argument is that it will tie all the sites together and that would be a huge risk (as Google will perceive it as a network)- also that it would be seen as duplicate content? Would really appreciate any feedback on what is the best to do in this case. Thanks!!
International SEO | | RedSearch010 -
Will website with tag hreflang pass link juice to other country/language version of website?
For example, I have a website XXX.com and I made hreflang tags to other country/language versions of website: ru.XXX.com (for Russia/Russian) XXX.com.ua (for Ukraine/Russian) ua.XXX.com (for Ukraine/Ukraine) Then I will acquire links to XXX.com. The question is: will XXX.com pass link juice to websites ru.XXX.com, XXX.com.ua and ua.XXX.com. Will these websites rank in their countries if I will acquire links ONLY to XXX.com? I looked at https://support.google.com/webmasters/answer/189077?hl=en, but haven't found what google think about that. Thank you in advance. I will appreciate your help.
International SEO | | Kabanchik0 -
Ranking in Different Countries - Ecommerce site
My client has a .com ecommere site with UK-based serves and he wants to target two other countries (both English speaking). By the looks of it, he wouldn't want to create separate local TLDs targeting each country, I therefore wanted to suggest adding subdomains / subfolders geo-targeted to each country that they want to target, however, I'm worried that this will cause duplicate content issues... What do you think would be the best solution? Any advice would be greatly appreciated! Thank you!
International SEO | | ramarketing0 -
Romanian users searching english kw/info
Hi! My users would search for english keywords, for example: "my product review". How can I optimize my site for english searches? Should I post the content in romanian and the title in english? Things I've done so far: 1.Romanian tld 2. Ip from Romania 3. Links from romanian sites. What do you guys think? Thanks!
International SEO | | aleisterl0 -
How should I make my site better?
I am glad to join seomoz,I am from China,the seomoz is a famous seo service provider company,some reason is one seoer guru named zac introduce seomoz to Chinese seor. So I think if seomoz provider seo tools or service to chinese seoer is a good idea.The market is very big.But chinese biggest SE is www.baidu.com,not google.There is something diffrent from baidu and google. My site is www.cn-sen.com, It's good performance at google with the keyword "除湿机",but it's have some trouble at baidu.I think the content of website is the main reason.and internal link is not good. Could someone give me some advise of seo to make my site better performance? thanks very much.
International SEO | | tylrr1230 -
Multi Language / target market site
What is the best way to deal with multiple languages and multiple target markets? Is it better to use directories or sub-domains: English.domain.com Portuguese.domain.com Or Domain.com Domain.com/Portuguese Also should I use language meta tags to help the different language versions rank in different geographic areas e.g. Are there any examples of where this has been done well?
International SEO | | RodneyRiley0