Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is there a way for me to find out how a keyword would rank if it were on a specific site?
Is there a way for me to find out how a keyword would rank if it were on a specific site? For example, lets say that XYZ.com does not have the keyword "ABC". Is there a way for me to find out how the keyword "ABC" would rank if it were on XYZ.com?
Moz Pro | | TurboH0 -
Canonical URLs all show trailing slash on main site pages - using Yoast SEO for Wordpress - how to correct
We are using Yoast for a number of our sites. We use naked domain as the canonical. I have noticed in the header tags that all our sites show the canonical URLs as having a trailing slash: Example: http;//foxspizzajc.com, when I look at the source code, it shows the canonical as http;//foxspizzajc.com/ Of course, it is much more likely that all sites that link to us will not use the trailing slash - so preferably we do not want that to be the canonical - among other reasons. Does this need to be fixed so the trailing slash is removed? I cannot see how to do this in Yoast SEO or in Permalinks structure for Wordpress. Sorry for my ignorance. Thanks for any help.
Moz Pro | | Adam_RushHour_Marketing1 -
How to choose the best canonical URL
In a duplicate content situation, and assuming that both rel=canonical and a 301 redirect pass link equity (I know there is still some speculation on this), how should you choose the "best" version of the URL to establish as the redirect target or authoritative URL? For example, we have a series of duplicate pages on our site. Typically we choose the "cleanest" or shortest non-trailing-slash version of the URL as the canonical, but what if those pages are already established and have varying page authority/backlink profiles? The URLs are: example.com/stores/locate/index?parameters=tags - PA = 54, Inbound Links = 259 example.com/stores/locate/index - PA = 60, Inbound Links = 302 example.com/stores/ - This is the version that currently ranks. PA = 42, Inbound Links = 3 example.com/stores - PA = 40, Inbound Links = 8 This might not really even matter, but in the interests of conserving as much SEO value as possible, which would you choose as either the 301 redirect target and/or the canonical version? My gut is to go with the URL that's already ranking (example.com/stores/) but curious if PA, backlinks, and trailing slashes should be considered also. We of course would not 301 the URL with the tracking parameters. 🙂 Thanks for your help!
Moz Pro | | Critical_Mass0 -
Page Ranking by URL / Keyword
Needing to know how to find out the page rank of a URL that is NOT within the top 50 or top 100. Need to know that specific page's rank, not what our overall site's ranking for the keyword is. Can't seem to find any tool that goes beyond the top 100. Any ideas?
Moz Pro | | leankit0 -
Link from Gizmodo disappeared from Open Site Explorer
Hi, I have been using OSE to check competitor links, DA, PA etc. And recently noticed that an author at Gizmodo was kind enough to link us to a blog post of his. This is great news as Gizmodo has a DA of 94 and a PA of 50 (Which is pretty big compared to our DA of 30 and PA of 42). The link to the post is here: http://gizmodo.com/5956401/everything-you-need-for-the-best-trick+or+treating-house-in-the-neighborhood And the link to our website is: http://www.electromarket.co.uk/lighting-effects/lighting-effects/strobe/ffa0144 It was showing on OSE for the past few days but now it has vanished and it is showing channel5 (TV Channel in the UK) as the highest DA linking to us, which is still pretty good. But I just want to know why the link has stopped displaying on OSE 😞 Any help or insight is appreciated! Tom
Moz Pro | | tomhall900 -
Keyword Difficulty Tool not working?
I just started using SEOMoz and I was running some searches with the Keyword Difficult Tool. It was going swell until about 10 hours ago when I begun getting this message: "Uh oh... there was a temporary problem gathering Analysis data for your request. Sorry about that! We're actively looking into resolving these intermittent issues, but in the meantime, try submitting your request again in 20 minutes. Thank you!" I gave it time to no avail. It has been about 10 hours since then and I can't KDT at all. Am I doing something wrong or is it on the SEOMoz side? Everything else works just fine.
Moz Pro | | Peke2 -
How to Use Open Site Explorer
I've used Open Site Explorer here at SEOmoz for the first time and I'm confused by the results. I'm wondering how dated the results are? And, what are they based on? For example, I'm certain my facebook shares and like are higher...same with the twitter links. It seems kind of old?! One of my competitors who gets about 2x more traffic than me DOES have great backlinks. I know that. BUT, it's odd that her facebook and twitter results are what they are compared to mine - they're WAY higher in site explorer AND her links seem on par with her facebook page whereas mine don't. Whereas mine seem WAy Way lower than what they are in reality. She barely tweets and facebooks any more. Maybe once per month. She started out gangbusters, but doesn't do it much any more. That's kinda why I'm wondering if it's based on older stuff and not updated often? Anyone know?
Moz Pro | | annasus0 -
Why does open site explorer show me Chinese characters?
Hi there, When using open site explorer to check the anchor text for www.bookvictoriafalls.com I see Chinese characters as anchor text. However, when I do the same for www.bookvictoriafalls.com/ I see normal English anchor text. Coincidentally this domain has dropped from first page rankings to outside the top 50. Is this a spam issue? Open-Site-Explorer.png
Moz Pro | | Robbern0