Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are there any free (or paid) tools available online that download Meta Tags for ALL URL's of a website?
Hi, I am looking to run an On-Site audit for a website and I'm wondering if there are any tools available online that take the existing Meta Tags on ALL pages of a website and downloads them to a .CSV or .XLS. Would need Meta Title and Meta Description for all pages at the very least. Any suggestions are appreciated - looking for Free or Paid options. Thanks.
Moz Pro | | SEO5Team0 -
Forward slash on URL on Duplicate Content Report
Hi I'm new to this whole Moz thing, so needing help from some kind people! I've just looked at my Duplicate Page Content report and there are loads of URLs in there which are the same but are just differentiated by adding / at the end of the URL, e.g. http://youngepilepsy.org.uk/news-and-events/events http://youngepilepsy.org.uk/news-and-events/events/ Is this be a canonical issue? I can't understand why though as these aren't at the root. However when we add inline text links within the page HTML, there are some URLs with / and some without, could that be the reason? Thanks for your help! Jackie
Moz Pro | | YoungEpilepsy1 -
Link from Gizmodo disappeared from Open Site Explorer
Hi, I have been using OSE to check competitor links, DA, PA etc. And recently noticed that an author at Gizmodo was kind enough to link us to a blog post of his. This is great news as Gizmodo has a DA of 94 and a PA of 50 (Which is pretty big compared to our DA of 30 and PA of 42). The link to the post is here: http://gizmodo.com/5956401/everything-you-need-for-the-best-trick+or+treating-house-in-the-neighborhood And the link to our website is: http://www.electromarket.co.uk/lighting-effects/lighting-effects/strobe/ffa0144 It was showing on OSE for the past few days but now it has vanished and it is showing channel5 (TV Channel in the UK) as the highest DA linking to us, which is still pretty good. But I just want to know why the link has stopped displaying on OSE 😞 Any help or insight is appreciated! Tom
Moz Pro | | tomhall900 -
My Campaign only crawled 3 pages on my site
On my first crawl of a new campaign, the software only crawled 3 pages. XXXaceXXXscholarships.org any ideas?
Moz Pro | | Santaur0 -
On Page URL's not updating?
I recently moved my site to Shopify which completely changed the URL structure. I went to my old pages and created permanent 301 redirects to my new pages on shopify. For some reason, the On Page reports are not picking up on one of the new pages. This is effecting my grading. Just wondering why this is happening & whether this may be an indication of a larger problem? Any help would be greatly appreciated! Thanks!
Moz Pro | | PedroAndJobu0 -
SEOmoz Keyword Difficulty Tool been down for a few days?
Hi All, I notice the SEO moz keyword difficulty tool has been down for a few days!!! I know from support that they say it is going to be a "while" till it fixed, but some type of estimation on how long it will be will be good. Also in regards to the types of accounts, why do the top accounts have the same limitations as the 79/month tool in regards to the keyword tool reports (50 max and 5 per scan)? I mean this is probably a wider question for the SEOmoz team need to answer. Kind Regards.
Moz Pro | | ColumbusAustralia2 -
Data from Open Site Explorer to Excel
Hello, Im having a problem with the data I pull off from Open Site Explorer. Everytime I download a report in CSV, when I open it in Excel 2007, all the information is like this: http://i.imgur.com/rwMxO.png What can I do to extract the information exactly as it appears in the Open Site Explorer with all the fields in the right place? Tks guys, Regards, Pedro Pereira [](<a href=)" target="_blank">a> rwMxO.png
Moz Pro | | PedroM0 -
Crawl test tool from SEOmoz - which URLs does it actually crawl?
I am using for the first time the crawl test tool from SEOmoz and I do not really understand which URLs the tool is going to crawl. First, it says "enter any subdomain" --> why can´t I do the crawl for the root domain? Second it says "we'll crawl up to 3,000 linked-to pages" --> does that mean that the tool crawls all internal links that it can find on the given domain? Thanks for your help!
Moz Pro | | Elke.GetApp0