Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Links not appearing in Moz tool
Hey Guys I am finding that my Moz tool isn't showing links that are definitely there like from social media etc. Also links that are there about 4-5 months are not showing either! am i doing something wrong?
Moz Pro | | Johnny_AppleSeed0 -
Getting a URL Unaccessible on the page grader
I'm optimizing a site for a financial advisor, here is the site: http://www.mattkeenancfp.com I am getting the message "that URL is unaccessible" when I try to use the on-page grader. This is an emerald website too, I'm not sure if that has any effect on anything though.
Moz Pro | | ryanbilak0 -
Can Moz tools help me with this effort?
I want to brainstorm different keywords, look for high search volumes, and low competition. Then I want to create landing page, rank them using seo techniques, and collect optin email addresses so I can communicate with interested users and build helpful products. With all the moz tools available to us, how can I accomplish the above mentioned goal? Have you done some thing similar? What are your experiences? Am I talking pie in the sky? Are there any practical examples where all these steps were executed? Thanks
Moz Pro | | zsyed0 -
Long URLs
My website is hosted by Hubspot. When I create a blog, the URL, as an example, would be: http://www.boxtheorygold.com/blog/bid/27061/Manage-By-the-Numbers/ Instead I am getting the URL below. Google Webmaster tools and moz see this as an error and google says it can't crawl because it is a non-existent page. Users cannot see this page, and Hubspot can't figure it out, but google and moz see it. This problem is occurring on about 25 blogs out of 150. Any ideas? And thanks. URL: http://www.boxtheorygold.com/blog/bid/27061/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/Manage-By-the-Numbers URL: http://www.boxtheorygold.com/blog/bid/27061/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/Manage-By-the-Numbers
Moz Pro | | Rong0 -
How reliable are the external link metrics found in the Open Site Explorer research tool?
I have noticed huge discrepancies with the external link metrics of one of the websites that I have been tracking since the launch of the website. Shortly after launch I captured a screenshot of the external link metrics to the website to track the progress. As I look at the same Open Site Explorer metrics now (2 months later) it has increased by 900% which is very inconsistent with the amount of link building done for the client. Thanks!
Moz Pro | | michaeleagar0 -
canonical URL tag
Hello, I was checking my ON page SEO, and one of the things i see Number of Canonical tags 2 Remove all but a single canonical URL tag I didn't fully understand, what is canonical URL tag? my website is http://novitasalonandspa.com Thanks for help
Moz Pro | | vlad_mezoz0 -
How do I get the SERP overlay tool to work?
I have the SEOMoz toolbar installed. In the settings I have a tick next to Display SERP Overlay. When I first activated this it showed up but with no data just continually searching for a long time. Now it is not showing up at all. This is such a great tool, how can I get it working? Thanks, Daniel.
Moz Pro | | iSenseWebSolutions0 -
SEOMoz site crawlers created an issue for our servers
I have set up a number of campaigns with your pro tool. Unfortunately we have 7 sites on our server and our IT dept have said that we had an issue when your site crawlers visited for several sites at the same time - is there any way that I can retain the campaigns but have the sites crawled on request rather than automatically?
Moz Pro | | StephenALee0