Randoom a Michael Friis production

Posted
18 March 2012 @ 4am

Categories
Scraping

You're reading Randoom, a Michael Friis production

Raw updated data on Danish business leader groups

Last summer, I published data on the members of Danish business leader groups, obtained with code written while I was still at Ekstra Bladet. I’ve cleaned up the code and removed the parts that fetched celebrities from various other obscure sources. You can fork the project on Github.

The code is fairly straightforward. The scraper itself is less than 150 loc. The scraper is configured to be run in a background worker on AppHarbor and will conduct a scrape once a month (I don’t know how often the VL-people update their website, but monthly updates seems sufficient to keep track of coming and goings). The resulting data can be fetched using a simple JSON API. You can find a list of scraped member-batches here (there’s just one at the time of writing). Hitting http://vlgroups.apphb.com/Member will always net you the latest batch.

I was motivated to revisit the code after this week’s dethroning of Anders Eldrup from his position as CEO of Dong Energy. Anders Eldrup sits in VL-gruppe 1, the most prestigious one. Let’s see if he’s still there next time the scraper looks. 14 other Dong Energy executives are members of other groups, although interestingly, Jakob Baruël Poulsen (Eldrup’s handsomely rewarded sidekick) is nowhere to be found. I think data like this in an important piece of the puzzle to figure out what relations exist between business leaders in Denmark and the Anders Eldrup debacle demonstrates why keeping track is important.


1 Comment

Posted by
Screen scraping with WatiN | Randoom
8 May 2012 @ 9pm

[…] scraping websites can range in difficulty from the very easy to extremely hard. When encountering hard-to-scrape sites, the typical cause of difficulty is just […]


Leave a Comment