Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can Moz tools help me with this effort?
I want to brainstorm different keywords, look for high search volumes, and low competition. Then I want to create landing page, rank them using seo techniques, and collect optin email addresses so I can communicate with interested users and build helpful products. With all the moz tools available to us, how can I accomplish the above mentioned goal? Have you done some thing similar? What are your experiences? Am I talking pie in the sky? Are there any practical examples where all these steps were executed? Thanks
Moz Pro | | zsyed0 -
Migration tool from RavenTools
I like to migrate some web sites from RavenTools. Do you have any migration tool utility to do it?
Moz Pro | | awebdesign
what do you recommend agencies migrating from RavenTools to seomoz?
Thanks
Lewis0 -
Open Site Explorer vs Webmaster Tools
Hi there. OSE is showing 53 linking domains and WMT is showing 161.
Moz Pro | | JeromeSavon
Why are so many missing from OSE. They are all links of a decent age. Thanks0 -
Canonical link on canonical url
This might seem a bit of an odd one, but we seem to be going around in circles on this when using the on page optimizer tool. We have an ecommerce site (magento) which by default is putting a canonical link in the header on every product page. For example; www.example.com/product1.html has the But when we run the on page optimiser tool, we're losing points on the critical section for not having canonical set correctly. If we remove the tag, we get the tick and the a grade, but then further down the report we lose a tick for not using canonical links. What are we missing here?
Moz Pro | | andyjsi0 -
Keyword Difficulty Tool
Is there a way to use KDT and include my own URL in the process so that I can see (and show my client) how things look competitively across all these nice dimensions? All is well if my client's site is in the top 10 - but if it isn't, how can I get the same set of metrics on a specific URL as it pertains to a specific keyword? Do I somehow to remember it used to do this? Or am I imagining things? I can't seem to get it to work this way. Thanks,
Moz Pro | | seo_plus0 -
Handling long URLs and overly-dynamic URLs on eCommerce site
Hello Forum, I've been optimizing an eCommerce site and our SEOmoz crawls are favorable for the most part, except for long URLs and overly-dynamic URLs. These issues stem from two URL types: Layered navigation (faceted search) and non-Google internal search results. I outline the issues for each below. We use an SEO-friendly URL structure for our product category pages, but once bots start "clicking" our layered navigation options, all the parameters are appended to our SEO-friendly urls, causing the SEOmoz crawl warnings. Layered Navigation :
Moz Pro | | pano
SEO-Friendly Category Page: oursite.com/shop/meditation-cushions.html Effects of layered navigation: oursite.com/shop/meditation-cushions.html?bolster_material_quality=414&bolsters_appearance=206&color=12&dir=asc&height=291&order=name As you can see the parameters include product attributes and page sorts. I should note that all pages generated by these parameters use the element to point back to the SEO-friendly URL We have also set up Google's Webmaster Tools to handle these parameters. Internal Search Function:
Our URLs start off simple: oursite.com/catalogsearch/result/?q=brown. Then the bot clicks all the layered navigation options, yielding oursite.com/catalogsearch/result/index/?appearance=54&cat=67&clothing_material=83&color=12&product_color=559&q=brown. Also, all search results are set to noindex,follow. My question is: Should we worry about these overly-dynamic and long ULR warnings? We have set up canonical elements, "noindex,follow" solutions, and configured Webmaster Tools to handle our parameters. If these are a concern, how would you resolve these issues?0 -
How long has the keyword difficulty tool had these limits in place?
While working against a tight deadline, I was surprised to see the following message: "We're sorry. Currently we are only able to offer results for 300 keywords per user per day. Please come back tomorrow" How long has this limit been in place and is the limit listed anywhere during the signup process? I rarely use this tool for more than 10-20 keywords at a time, so I have not run into this issue before.
Moz Pro | | davidangotti0 -
Tool which shows site ranking for a given keyword
Hi all. I have a client with a specific request and wanted to ask if there is a reliable tool which allows a user to enter a given site and keyword, and it will return the site's ranking for that keyword. More specifically: Needs to work for Google, Yahoo and Bing Needs to work for various countries such as Google.ca, Google.it, etc. Needs to show at least the top ?10k rankings, not just the top 50 The last requirement is the challenge. I clearly recognize anything past the top 50 or so ranks is really off the map, but the client would like to view his current standings.
Moz Pro | | RyanKent0