Any tools for scraping blogroll URLs from sites?
-
This question is entirely in the whitehat realm...
Let's say you've encountered a great blog - with a strong blogroll of 40 sites.
The 40-site blogroll is interesting to you for any number of reasons, from link building targets to simply subscribing in your feedreader. Right now, it's tedious to extract the URLs from the site. There are some "save all links" tools, but they are also messy.
Are there any good tools that will
a) allow you to grab the blogroll (only) of any site into a list of URLs (yeah, ok, it might not be perfect since some sites call it "sites I like" etc.)
b) same, but export as OPML so you can subscribe.
Thanks!
Scott
-
Not at all. I guess my feeling here is that there is a sort of untapped social graph defined by blogrolls. If it were simple to harvest them upon visiting a blog (e.g. this blogger recommends...) one could do a stumble-on-steroids approach to a niche.
-
I thought you might be able to use the outbound link scraper to grab the outbound link onto the page. Pop in your URLS of the pages you want to scrape and it will spit out our a list of those domaind and urls. You can take those urls and put them into the contact finder and it will return the contact details for those sites. Combine the two spreadsheets for an epiuc list of blogs to contact for your outreach.
This is obviously for link building rather than subscribing - sorry if I have misunderstood what you were trying to do
-
Hi Keri,
That is a very cool tool, but is overkill for this. It takes far too many steps to accomplish only part of the desired goal of grabbing all blogroll URLs (within the blogroll DIV tag) and exporting the list to a valid OMPL file or URL list.
thanks!
-
nothing I saw there would do this. It looks like it could manage to list all external links, and I suppose you could manually pick the blogroll out of it.
-
Hi there,
Well, Keris response reminded me of this question and the fact that I found a tool for scraping these kind of lists:
Here it is (with some other cool tools) , have fun:
-
Hi Scott,
I'm going through older questions. Did you ever find a tool to do what you wanted to do here?
-
One thing to look at is Outwit Hub for Firefox. It might be able to help with that. It can scrape data from a page and do a lot with it. http://www.outwit.com/products/hub/. Don't know that it meets all of your needs, but I also haven't seen a response with anything better at the moment.
-
Hey Scott,
What a great question and <sigh>I don't have the answer. I am going to back to find out what people come up with here. Surely there is someone that lurks these parts that can throw something together?</sigh>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What SEO tools do you use in conjunction with Moz?
It seems like most people use multiple SEO tools. I am interested in hearing what you use in conjunction with Moz and why. -Stephen
Moz Pro | | martechwiz2 -
Recovering rankings after a botched url change
Hi there, I have for a long time had a bicycle maintenance website at madegood.org. Over the years the film branch of this business has taken off and moved in a slightly different direction, so I thought in March I decided to move madegood.org to madegoobikes.com, and create a new website for my film business at madegood.com. I thought I did a good job of telling google about my change of domain, but my rankings completely died, so about a month I moved madegoodbikes.com back to madegood.org. So far I haven't seen any sign of a recovery in my rankings, I'm getting almost no visits. I've check all my top pages on OSE and everything seems to be in place. https://moz.com/researchtools/ose/pages?site=http%3A%2F%2Fwww.madegood.org%2F&no_redirects=0&sort=page_authority&filter=all&page=1 Is it normal to wait over a month for my rankings to recover, or is there anything else I should be doing? Any tips/ideas/advice whatsoever will of huge help!
Moz Pro | | madegood0 -
Long URLs
My website is hosted by Hubspot. When I create a blog, the URL, as an example, would be: http://www.boxtheorygold.com/blog/bid/27061/Manage-By-the-Numbers/ Instead I am getting the URL below. Google Webmaster tools and moz see this as an error and google says it can't crawl because it is a non-existent page. Users cannot see this page, and Hubspot can't figure it out, but google and moz see it. This problem is occurring on about 25 blogs out of 150. Any ideas? And thanks. URL: http://www.boxtheorygold.com/blog/bid/27061/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/Manage-By-the-Numbers URL: http://www.boxtheorygold.com/blog/bid/27061/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/www.boxtheorygold.com/blog/bid/12158/Manage-By-the-Numbers
Moz Pro | | Rong0 -
SEOMoz Tool Bar
Can google put a temporary ban on my IP using the SEOmoz Toolbar too many times? TY!
Moz Pro | | TP_Marketing0 -
Tool bar - Analyse page - Link data
Hi all, Need a little help to understand this link information, if you go to our toolbar it show 1.6 million links on 224 root domains. when you analyse the page using the Tool bar. the link data show the page as having over 3000 internal links but when you go page attributes it shows a more complementary 92 page links we have a menu to all our pages and I am wondering if this is being registered in its entirety
Moz Pro | | LocksOnline0 -
Alexa Ranking Sites
I found these two sites giving my competitor link juice: http://www.webnamelist.com/alexa/Alexa_186.html http://www.list-of-domains.org/alexa/Alexa_185.html I have seen these sites before and I just dont get why they are authoritative. The funny thing is I did a search for my competitors link on the page and its not showing up, is this a problem in site explorer? Why is site explorer mentioning these sites as my competitions best links when these links do not exist on their site?
Moz Pro | | SEODinosaur0 -
Keyword tool: SEOMOZ spacific month ? vs adword tool 12 month average but same data ???
Running a keyword analysis in SEOMOZ it shows my the folowing information "Local Search Volume (Dec)". I compared the data for the specific country , language and keyword with the adwords keyword tool and it exactly showed me the same numbers. The adwords keyword tool shows: "Local Monthly Searches: This column shows the approximate 12-month average number of search terms matching each keyword" http://support.google.com/adwords/bin/answer.py?hl=en&answer=25148 So if the numbers are the same in google keword tool and SEOMOZ why is SEOMOZ saying that for a specif month? If the data is the same one of both can not be right or probaly I didn't get the point. See screenshot: http://screencast.com/t/GyaaW7EkwV Thanks for help
Moz Pro | | n-media0 -
Use of the tilde in URLs
I just signed up for SEOMoz and sent my site through the first crawl. I use the tilde in my rewritten URLs. This threw my entire site into the Notice section 301 (permanent redirect) since each page redirects to the exact URL with the ~, not the %7e. I find conflicting information on the web - you can use the tilde in more recent coding guidelines where you couldn't in the old. It would be a huge thing to change every page in my site to use an underscore instead of a tilde int he URL. If Google is like SEOMoz and is 301 redirecting every page on the site, then I'll do it, but is it just an SEOMoz thing? I ran my site through Firebug and and all my pages show the 200 response header, not the 301 redirect. Thanks for any help you can provide.
Moz Pro | | fdb0