Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Difference keyword tool and related topics
Hello, Can someone explain me the difference between those 2 ? Because to me they are very similar. Thank you,
Moz Pro | | seoanalytics0 -
Moz Pro Tools
Hello I ran into a error while using Moz Pro Tools Crawl Site feature. Stating that my wordpress website could not be crawled. When contacting moz they sent me this screenshot stating the reason for this error is because of the odd ip address highlighted in yellow. Only time I've seen this particular ip-address is during local development. If anyone has any advice on how to fix this or what may have caused this issue. I feel this maybe effecting the site's overall search visibility. ednqnL7
Moz Pro | | willakawillow220 -
My Campaign only crawled 3 pages on my site
On my first crawl of a new campaign, the software only crawled 3 pages. XXXaceXXXscholarships.org any ideas?
Moz Pro | | Santaur0 -
Organic Competition Assessment Tool Recommendation Please
Hi, What tool do you recommend (and why) for best assessing your organic search competition's keywords AND that includes some kind of reasonably accurate estimate of what organic search traffic they are getting by specific page or keyword? Thanks... Darcy
Moz Pro | | 945010 -
Adding Disavowed Domains to Open Site Explorer?
Hi, Is there a way to add to the OSE a list of disavowed domains? Also, how often is it refreshed?
Moz Pro | | BeytzNet
I know that the GWMT shows us links on sites that are down for months now. Thanks0 -
Getting relevant keywords from URL with Google KW Tool.
Hi, When I first start researching a site, I like to see what Google "thinks" it is relevant to. I use the Google KW Tool and enter the website URL only. I sort the results by relevance. I can then show the prospective client what Google thinks his site is optimized for and use that info to show him what opportunities exist to rank for terms more relevant to his business. I show him keyword, volume and I also get current SERP rank for his site. For larger sites, I do this for the top pages based Domain Authority. I want to automate this process using excel and APIs but Google refused my API token request. I told them I wanted to use the "Google AdWords API Extension for Excel" from http://seogadget.co.uk/google-adwords-plugin-excel. The Google API token team replied: Please note, after reviewing your application in detail, we are sorry to let you know that we won't be able to approve your token. We understand that you are planning to use the AdWords API mainly for Targeting Idea Service (TIS) and Traffic Estimation Service (TES) such as 'keyword research'. Please note that as per the Required Minimum Functionality (RMF) outlined in the API Terms & Conditions, using the AdWords API exclusively for TIS and TES type of services is not allowed. Q1: What does the KW Tool relevancy data mean, anyway? Q2: is there another way to get it or is there another way to do this? Q3: Is there a better approach I should take with the Google API team? Q4: Are there other APIs and Excel plugins that can do this, including the SEOMoz APIs? Thanks,
Moz Pro | | phersh
Phil0 -
Canonical issue in open site explorer
When I look at my back links in OSE, I see two landing pages on my site that are really the same page. www.mysite.com/ and www.mysite.com/(affiliate code here) These show different inbound link characteristics and page authority. The page in question has a rel=canonical tag. Am I doing something wrong?
Moz Pro | | EugeneF0 -
Use of the tilde in URLs
I just signed up for SEOMoz and sent my site through the first crawl. I use the tilde in my rewritten URLs. This threw my entire site into the Notice section 301 (permanent redirect) since each page redirects to the exact URL with the ~, not the %7e. I find conflicting information on the web - you can use the tilde in more recent coding guidelines where you couldn't in the old. It would be a huge thing to change every page in my site to use an underscore instead of a tilde int he URL. If Google is like SEOMoz and is 301 redirecting every page on the site, then I'll do it, but is it just an SEOMoz thing? I ran my site through Firebug and and all my pages show the 200 response header, not the 301 redirect. Thanks for any help you can provide.
Moz Pro | | fdb0