Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Rank Tracking paid tool
I need one of the best rank tracking tool that allows me to upload keywords in bulk but first I need to see its trial to check either they update daily or they stay with weekly rank which is mostly the case with many free tools. Kindly tell me the one that you have tested and its working good.\ Updated daily.
Moz Pro | | csfarnsworth
Have free testing feature and than can purchase it no matter in a week. Support a lot of keyword to be track.0 -
Good ranking tool for RU?
I use SEOMoz to track my rankings in RU, but there are a few problems: 1 - SEOMoz doesnt offer Yandex tracking 2 - When exporting rankings to CSV it only shows the keywords in EN characters and looks something like this ??asad&$ Any suggestions for a better tool to use for RU? Thanks!
Moz Pro | | theLotter0 -
Still Cant Crawl My Site
I've removed all blocks but two from our htaccess. They are for amazonaws.com to block amazon from crawling us. I did a fetch as google in our WM tools on our robots txt with success. SEOMoz crawler here hit's our site and gets a 403. I've looks in our blocked request logs and amazon is the only one in there. What is going on here?
Moz Pro | | martJ0 -
Blog Page URLs Showing Duplicate Content
On the SEOMoz Crawl Diagnostics, we are receiving information that we have duplicate page content for the URL Blog pages. For Example: blog/page/33/ blog/page/34/ blog/page/35/ blog/page/36/ These are older post in our blog. Moz is saying that these are duplicate content. What is the best way to fix the URL structure of the pages?
Moz Pro | | _Thriveworks0 -
Best Keyword Difficulty Tool?
Hi All, I'm a bit frustrated with the fact that I can only enter 5 words at a time into SEOmoz's keyword difficulty tool. Does anyone know of a better way (or tool) to analyze keyword difficulty for hundreds of keywords?
Moz Pro | | nicole.healthline0 -
Competitive Link Analysis Tool?
Hi, I ran a competitive link analysis report today and back came quite a few domains that 2 or more of my 5 main competitors link from. Is it worth me submitting links to these sites? And would i be best served submitting my homepage URL or submitting a brand page such as Creative Recreation Trainers? I want to target that brand but don't want to do it if my main URL is better? Any ideas? See below my report. | Subdomain | Subdomain mR | Subdomain mT | # Competitors | # Linking Pages | Link Acquired |
Moz Pro | | YNWA
| t.co/ | 8.05 | 8.04 | 2 | <a>2</a> | |
| ww2.cox.com/ | 5.99 | 6.50 | 2 | <a>3</a> | |
| www.littlewebdirectory.com/ | 5.90 | 5.59 | 2 | <a>2</a> | |
| www.amazines.com/ | 5.69 | 5.66 | 2 | <a>3</a> | |
| svpply.com/ | 5.66 | 5.53 | 3 | <a>20</a> | |
| www.jayde.com/ | 5.64 | 5.68 | 3 | <a>4</a> | |
| www.pearltrees.com/ | 5.58 | 5.81 | 2 | <a>2</a> | |
| www.businessseek.biz/ | 5.52 | 5.51 | 2 | <a>3</a> | |
| www.a1articles.com/ | 5.50 | 5.22 | 3 | <a>9</a> | |
| www.linksilo.de/ | 5.48 | 5.23 | 2 | <a>15</a> | |
| www.alistsites.com/ | 5.46 | 5.24 | 2 | <a>38</a> | |
| www.the-free-directory.co.uk/ | 5.37 | 5.07 | 2 | <a>20</a> | |
| www.walhello.com/ | 5.30 | 4.97 | 2 | <a>2</a> | |
| www.quarkbase.com/ | 5.14 | 5.12 | 2 | <a>2</a> | |
| snipsly.com/ | 5.13 | 5.20 | 2 | <a>21</a> | |
| www.counterdeal.com/ | 5.12 | 5.07 | 2 | <a>2</a> | |
| www.01webdirectory.com/ | 5.03 | 5.03 | 2 | <a>2</a> | |
| www.2addlink.info/ | 4.92 | 4.58 | 3 | <a>4</a> | |
| www.fuk.co.uk/ | 4.64 | 5.00 | 3 | <a>20</a> | |
| www.final-fantasy.us/ | 4.63 | 4.77 | 2 | <a>2</a> | |
| oyax.com/ | 4.42 | 4.61 | 2 | <a>4</a> | |
| www.touchretail.co.uk/ | 4.33 | 4.21 | 2 | <a>4</a> | |
| tptbtv.cold10.com/ | 4.27 | 4.86 | 3 | <a>1</a> | |
| www.mastbusiness.com/ | 4.23 | 4.34 | 2 | <a>2</a> | |
| www.competitionhunter.com/ | 4.16 | 4.21 | 2 | <a>6</a> | |0 -
20000 site errors and 10000 pages crawled.
I have recently built an e-commerce website for the company I work at. Its built on opencart. Say for example we have a chair for sale. The url will be: www.domain.com/best-offers/cool-chair Thats fine, seomoz is crawling them all fine and reporting any errors under them url great. On each product listing we have several options and zoom options (allows the user to zoom in to the image to get a more detailed look). When a different zoom type is selected it adds on to the url, so for example: www.domain.com/best-offers/cool-chair?zoom=1 and there are 3 different zoom types. So effectively its taking for urls as different when in fact they are all one url. and Seomoz has interpreted it this way, and crawled 10000 pages(it thinks exist because of this) and thrown up 20000 errors. Does anyone have any idea how to solve this?
Moz Pro | | CompleteOffice0 -
Open Site Explorer Question- Link Value?
One of my backlinks is from a site that has a page authority of 74. However, the domain is a domain I purchased and 302'd to my current main domain. What I'm wondering (without getting into why a 301 is better than a 302) is this: does OSE have any tool that shows if there is actually value in a link? My assumption is that despite this domain having a PA of 74, the 302 is not passing over any value. To be clear, I understand that a 302 doesn't pass over any SEO value, but my question is whether or not OSE shows the value of a link? Thanks!
Moz Pro | | RodrigoStockebrand0