Michael Friis' Blog

About


Knight Foundation Scholarship Essay

(This summer I applied for the “Coders Wanted” Knight Foundation Scholarship at the Medill School of Journalism. In case anyone’s interested, I’m uploading the essays I wrote for my application.)

Question: How do journalism and technology relate to one another in the digital age?

Technology relates to journalism in two different ways: It is a topic of coverage (“science journalism”) and a driver of change. The subject of science and tech journalism is an interesting one, but this essay will focus on technology as an enabler and driver of change in the practice of journalism.

Ever since the invention of movable type, technological progress has gradually deceased the amount of money and time required to distribute information. The advent of digital technology has lowered the cost to (almost) zero and made distribution instantaneous. As Chris Anderson argues in his recent book “FREE”, this final drop to zero marks a discontinuity and it has some profound implications.

The speed and ease of digital publishing now makes it possible for everyone to write news reports, shoot photos and record video of news events — endeavours that used to be the exclusive privilege of journalists and photographers. The Internet has also greatly increased the scope for reader feedback and debate on stories created by traditional journalists. Taken together, this has led to an interesting integration of newsgathering where professional and so-called “citizen” journalists collaborate and compete to dig up, investigate and publish news.

An extreme example of this are The Guardian’s (a British newspaper) recent attempts at making sense of UK parliament members’ expense claims. The expense records were released under a freedom of information request as more than 2 million scanned documents. To investigate these, the newspaper enlisted its readers (and the Internet at large) to wade through the documents, sift out the interesting claims, determine amounts and exactly what items were claimed.

The Internet has led to the development of a range of interesting platforms, similar to the one mentioned, where journalism-related activities are taking place even outside of the confines of traditional media organizations. The author, for example, has created a web site called Folkets Ting (“People’s Parliament”) which — in the tradition of sites like OpenCongress (US) and The Public Whip (UK) — makes legislation, votes and debates from the Danish parliament available for public scrutiny and debate. It used to be the responsibility of journalists to keep elected politicians to account, but tools like these enable interested citizens to join in. It is the author’s hope that such sites will increase the scope of debate beyond the, often narrow, attention span of traditional media and lead to a greater breadth of opinion being voiced (even if the result is also likely to be lot messier).

Unfortunately, digital technology and the Internet has also seriously undermined the business model of many traditional media companies. The decline of newspapers is a particular worry, partly because theirs has been such a rapid fall (several renowned American newspapers have already shut down and more are teetering on the brink of bankruptcy), partly because they seem to play an outsize role in digging up and investigating agenda-setting stories that other types of media then pick up.

The traditional newspaper business model was based on the fact that printing technology was expensive and building a subscriber-base required time and large investments. After these had been secured however, the newspaper could make a mint on classifieds and other ads and the revenue then subsidized newsroom activities. The Internet rudely killed off this model because there is now nothing stopping sites like Craigslist and eBay from just publishing classifieds (and auctions) to large audiences and not donate the proceeds to deserving journalists.

Publishers have variously called on readers, governments and Google to do something, “do something” usually meaning “give us more money” in some shape or form. News has become a commodity that readers in most cases are unwilling to pay for. A large decline in journalism may represent a failure of the market warranting government intervention, but it is a path fraught with danger. Demanding money be redestributed from a successful part of the value chain looks like zero-sum thinking and reveals an unwillingness to reconsider ones own business. It is the opinion of this aspiring journalist (and of Chris Anderson) that the old business model, or something like it, is unlikely to return.

What, then, of journalism? Some forms (business coverage most prominently) are prospering in spite of the Internet. Other forms may shrink somewhat or find themselves augmented or supplanted by enthusiastic citizen journalists using technology and global connectivity to their advantage. An area such as public oversight of politicians and institutions could expand greatly if good tools for improving transparency and reporting are developed.

The author believes that journalism in the digital is more exciting than ever. To be sure, there are challenges to overcome, but the advantages are many: Journalists can reach wider audiences, both faster and cheaper and they can involve, solicit feedback from and collaborate with more people than at any time before. The author can’t wait to develop the platforms and systems that will form the foundations of new kinds of digital journalism, and hopes, with the help of the Knight Foundation, to get a chance to do so at Medill.

Exchange Rate data

As part of our ongoing efforts at making sense of the Tenders Electronic Daily procurement contracts, I had to get hold of historical exchange rates to convert the values of all the contracts into a comparable form. Professor Werner Antweiler at The University of British Columbia maintains a very impressive, free database of exactly this data. He doesn’t let you export it in (great) bulk unfortunately. I wrote a small script to get the monthly data for the currencies I wanted, the important parts (in C#) are included below. Note that the site may throttle you. Also, please don’t use this to try to scrape all the data and republish it, or in other ways make a fool of yourself.

string url = "http://fx.sauder.ubc.ca/cgi/fxdata";
// this uses Euros as the base currency
string requeststring =
	string.Format(
	"b=EUR&c={0}&rd=&fd=1&fm=1&fy=2003&ld=31&lm=12&ly=2008&y=monthly&q=volume&f=csv&o=",
	"YOURCURRENCY");

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);

req.ContentType = "application/x-www-form-urlencoded";
req.Expect = null;
req.Method = "Post";

byte[] reqData = Encoding.UTF8.GetBytes(requeststring);
req.ContentLength = reqData.Length;
Stream reqStream = req.GetRequestStream();
reqStream.Write(reqData, 0, reqData.Length);
reqStream.Close();

HttpWebResponse WebResp = (HttpWebResponse)req.GetResponse();
var resp = WebResp.GetResponseStream();
StreamReader answer = new StreamReader(resp);
string res = answer.ReadToEnd();

if (res.Contains("Error"))
{
	throw new Exception(string.Format("Bad currency: {0}", curr));
}

if (res.Contains("Access"))
{
	// You're being throttled
}

var lines = res.Split(new char[] { '\n' });

// ignore the first two lines and the last two ones
for (int i = 2; i < lines.Length - 2 ; i++)
{
	var line = lines[i];
	var vals = line.Split(new char[] { ',' });

	// parse the vals
	var month = GetMonth(vals[0]);
	var year = GetYear(vals[0]);

	var rate = decimal.Parse(vals[1], CultureInfo.InvariantCulture);
}

// Util Methods
private static int GetMonth(string s)
{
	var month = s.Substring(1, 3);
	switch (month)
	{
		case "Jan": return 1;
		case "Feb": return 2;
		case "Mar": return 3;
		case "Apr": return 4;
		case "May": return 5;
		case "Jun": return 6;
		case "Jul": return 7;
		case "Aug": return 8;
		case "Sep": return 9;
		case "Oct": return 10;
		case "Nov": return 11;
		case "Dec": return 12;
		default: throw new Exception("crap");
	}
}

private static int GetYear(string s)
{
	var year = s.Substring(5, 4);
	return int.Parse(year);
}

Folkets Ting beta launched

I’ve created a new web site on Danish politics in the tradition of The Public Whip and OpenCongress (although it’s not yet nearly as good as those guys). It’s called Folkets Ting and comes with a complimentary blog (both in Danish). Go check it out.

Transatlantic Facebook application performance woes

Someone I follow on Twitter reported having problems getting a Facebook application to perform. I don’t know what they are doing so this post is just guessing at their problem, but the fact is that — if you’re not paying attention — you can easily shoot yourself in the foot when building and deploying Facebook apps. The diagram below depicts a random fbml Facebook app deployed to a server located in Denmark being used by a user also situated in Denmark. Note that Facebook doesn’t yet have a datacenter in Europe (they have one on each coast in the US).

fbservers

The following exchange takes place:

  1. User requests some page related to the application from Facebook
  2. Facebook realizes that serving this request requires querying the application and sends a request for fbml to the app
  3. The app gets the request and decides that in order to respond, it has to query the Facebook API for further info
  4. The Facebook API responds to the query
  5. The application uses the query results and the original request to create a fbml response that is sent to Facebook
  6. Facebook gets the fbml, validates it and macroexpand various fbml tags
  7. Facebook sends the complete page to the user

… so that adds up 6 transatlantic requests pr. page requested by the user. Assuming a 250ms ping time from the Danish app-server to the Facebook datacenter this is a whopping 1.5s latency on top of whatever processing time your server needs AND the time taken by Facebook to process your API request and validate your fbml.

So what do you do? Usually steps 3 and 4 can be eliminated through careful use of fbml and taking advantage of the fact that Facebook includes the ids of all the requesting users friends. Going for an iframe app is also helpful because it eliminates one transatlantic roundtrip and spares Facebook from having to validate any fbml. A very effective measure if you insist on fbml, is simply getting a server stateside — preferably someplace with low ping times to Facebook datacenters. There are plenty of cheap hosting options around, Joyent will even do it for free (I’m not affiliated in any way).

Webcam face detection in C# using Emgu CV

Some time ago I wrote a post on how to do face detection in C# using OpenCV. I’ve since begun using the Emgu CV wrapper instead of opencvdotnet. Emgu CV is much better, in active development and it even runs on Mono. Two gotchas:

  1. You don’t have to install OpenCV, but instead have to copy the relevant dlls (included with the Emgu CV download) to the folder where you code executes.
  2. Open CV and X64 are not friends. If you’re running X64 Windows (and unless you are up to recompiling OpenCV) you have to make sure your app is compiled to X86, instead of the usual “Any CPU”.
  3. Remember to add PictureBox as per the original tutorial.

Here’s sample code:

using System;
using System.Windows.Forms;
using System.Drawing;
using Emgu.CV;
using Emgu.Util;
using Emgu.CV.Structure;
using Emgu.CV.CvEnum;

namespace opencvtut
{
    public partial class Form1 : Form
    {
		private Capture cap;
		private HaarCascade haar;

        public Form1()
        {
            InitializeComponent();
        }

        private void timer1_Tick(object sender, EventArgs e)
        {
		using (Image<Bgr, byte> nextFrame = cap.QueryFrame())
		{
			if (nextFrame != null)
			{
				// there's only one channel (greyscale), hence the zero index
				//var faces = nextFrame.DetectHaarCascade(haar)[0];
				Image<Gray, byte> grayframe = nextFrame.Convert<Gray, byte>();
				var faces =
					grayframe.DetectHaarCascade(
						haar, 1.4, 4,
						HAAR_DETECTION_TYPE.DO_CANNY_PRUNING,
						new Size(nextFrame.Width/8, nextFrame.Height/8)
						)[0];

				foreach (var face in faces)
				{
					nextFrame.Draw(face.rect, new Bgr(0,double.MaxValue,0), 3);
				}
				pictureBox1.Image = nextFrame.ToBitmap();
			}
		}
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            // passing 0 gets zeroth webcam
			cap = new Capture(0);
            // adjust path to find your xml
			haar = new HaarCascade(
                "..\\..\\..\\..\\lib\\haarcascade_frontalface_alt2.xml");
        }
    }
}

LinqtoCRM and updating entities

There are some pitfalls when retrieving CRM entities with LinqtoCRM and trying to update them through the CRM web service. The most intuitive (but wrong) approach would be this:

var res = from c in p.Linq()
		  select c;

foreach (var con in res)
{
	con.address1_line1 = "foo";
	service.Update(con);
}

This fails unfortunately. I think someone at Netcompany (my former employer) worked out why this was at some point, but I’ve forgotten.

Instead what you want to do it is new up new entities yourself while setting the relevant id attribute, and then updating the attributes you want to change:

var res = from c in p.Linq()
		  select new contact() { contactid = c.contactid };

foreach (var con in res)
{
	con.address1_line1 = "foo";
	service.Update(con);
}

LinqtoCRM competitor and new version

A few days, a former collegue alerted me to xRM LINQ, a new commercial query provider for Microsoft CRM. I’ve downloaded the trial, of course, and it looks pretty good. xRM LINQ decided not to use usual web service classes and instead provide their own class generator/entity mapper (LinqtoCRM has one too, but only for generating many-to-many classes). This means you can’t mix and match Linq with traditional web service calls, and they had to implement their own create/update functionality. It’s a less gradual and more comprehensive approach than LinqtoCRM but it may give a smoother experience for the programmer. At any rate, I welcome xRM LINQ onto the CRM query provider stage and wish them the best of luck :-).

A less welcome addition is a company called Softpedia, a Romanian outfit. I won’t link to them, to avoid giving them any more Google Juice, but you can find them by googling LinqtoCRM. They seem to be screen-scraping CodePlex and similar sites for projects with permissive licenses and then put up copy-cat pages with downloads for these project on their own site. While not illegal, it’s not very useful for project owners or users either. They’ve been caught inflating their Wikipedia article and many user report trojans and similar on siteadvisor (to be fair, this seems to happen for other popular download sites too).

In other news, a new version of LinqtoCRM is out. It fixes some bugs that have surfaced over the last few months. I’ve also reorganised the wiki, hopefully making it easier for people to find what they’re looking for.

Found your start-up in Hong Kong?

I’ve just returned from a trip to Hong Kong. While there, I toured several startup parks and incubators and talked to a lot of entrepreneurs and some government officials. I think it just may be a pretty cool place to found your tech startup. Read on for reasons why.

In the fall (of 2008) I won a trip to Hong Kong by submitting a business idea on the back of a napkin to a competition run be the Øresund Entrepreneurship Academy. You can read more about the competition and my winning it here (including a picture of me holding a bouquet of flowers, a rare and uncommon sight). While I agree that sending more-or-less random people halfway around the world is a rather dubious use of taxpayer money, I was hard pressed to complain and dutifully went along.

I’ll start off with an interesting fact: Hong Kong has been an administrative region of the Peoples Republic of China since 1997, yet for the past 15 consecutive years it has been named the freest economy in the world by the Heritage Foundation. How do you like that, an area under the nominal thumb of communist China is on the top of a list published by a conservative American Think-tank? And it’s not the one of countries to invade next — I think it’s great!

The explanation for this wonderful paradox is that Hong Kong is administered under the “one country, two systems” regime. So while the Peoples Liberation Army diligently liberated Hong Kong after the British left, they limited themselves to doing just that, and have been holed up in their barracks ever since. Hong Kong is thus still governed under the principle of “Positive non-interventionism” formalized under John James Cowperthwaite, the colony’s financial secretary in the ‘60s. Some consequences of interest to entrepreneurs are:

  • Taxes are very low, with corporate tax at 16.5% and income taxes capped at 15% (most pay much less).
  • There is no value-added tax or sales tax.
  • There are no tariffs or customs on any imports, including wine and spirits.
  • Registering a limited liability company is easy and costs about 300 USD.
  • There are no controls on capital flows so you are free to brings investment in and take profits out.
  • The local currency is tied to the US Dollar so you run no currency risks if you are from that country.
  • There’s a strong and independent common-law based judiciary which strictly enforces IP rights.

Old Milton Friedman was a big fan of these policies (which brought the Hong Kong per capita GDP from 28% to 137% of Britain’s between 1960 to 1996) and wrote a great article for National Review in 1997.

China has promised that Hong Kong can shape its own policies for at least fifty years after the takeover, leaving another 37 years of laissez faire. The current Hong Kong political system has some democratic traits, but business interests are generally much more prominent than in Western-style democracies (I’m not saying that’s a good thing, just stating a fact). Freedom of Speech is respected and the government officials we talked to (from the Hong Kong Trade Development Council and Invest in Hong Kong) were very forthright and mostly positive in their estimates of Mainland Chinese intentions. The foreign officials we met (the Danish and Swedish general consuls) were slightly more cautious, but the general consensus seems to be that China is unlikely to mess with Hong Kong if for no other reasons than because the city is such an important conduit of goods and services to and from the mainland (Port of Hong Kong is the third largest in the world by container throughput). Hong Kong is also useful as a demonstration to Taiwan that it is now safe to return to The Motherland. A good example of the two systems at work is the recent cancellation of Oasis concerts in Beijing and Shanghai, apparently because Noel Gallagher played at a Free Tibet event in 1997. The concert in Hong Kong is still on.

Our itinerary included visits to two incubators operated by Hong Kong Science & Technology Parks. The Science Park is particularly impressive, newly built and stretching over 22 hectares of seaside property with shared IC labs and wet-labs should you need them. The other one is the InnoCentre, which focuses more in design startups. The programs at both incubators feature heavily subsidized rent, it’s free for the first year and then ramps up until the program ends in two to three years. Programs include financial aid packages to the tune of about 100,000 USD which can be used to cover non-recurring operational costs. The admission criteria are not onerous, other than your business idea having to pass several panels judging soundness and profitability. In particular, the incubators are open to foreign nationals registering their companies in Hong Kong, as long as they plan to hire local staff. We met a Swede and a Brit who had set up shop in Hong Kong and looked pretty chuffed. Whether you like government meddling with start-ups or not, these incubators just seemed very no-nonsense and well-thought out.

Hong Kong has a young, well-educated and tech-savvy population with most people using at least two mobile phones for work and private use respectively. In the MTR (Subway/Metro/Underground), which has excellent connectivity, you’ll see everyone punching away at iPhones and Blackberrys. There’s a more-or-less citywide wifi provided by either telcos or freely by the government and broadband is widely available. The transport infrastructure is ruthlessly efficient: The MTR will take you most places you want to go in air conditioned, escalated comfort and to top that off there’s a profusion of busses, trams, ferries and escalators. The airport has frequent flights to most places in Asia and abroad. Most Hong Kongers are immigrants or refugees (or descendants thereof) who have fled the excesses of various mainland governments. They’re self-reliant, industrious and hardworking. English knowledge is still widespread and many schools teach English as the first language.

While it lacks a good venture capital and business angel community, Hong Hong has excellent financial institutions. The Hong Kong Stock Exchange is the second biggest in the world in terms of IPO value. Asian banks have lower exposure to the global financial crisis because the buttoned down somewhat after the Asian financial crisis in the late ‘90s. Asia is traditionally a saving economy where people tend save up money before they go an buy stuff, as opposed to taking out a mortgage straight away. I’m not implying mortgages are bad, this is just to say that there’s a lot of money hidden away in bank accounts and mattresses in Hong Kong and the rest of Asia. Chinese banks, indeed, are lending freely now, as this Economist article details.

Other than rather steep housing costs, Hong Kong is a pretty cheap place to live. Transport is cheap and a good meal, with drinks, can be had for less than 10 USD. A live-in maid working six days a week is less than 1000 USD a month. For fun, you can go to the horse races or take the boat to Macau to gamble or just look at the lights. Somewhat surprisingly, Hong Kong has pretty good hiking, including the 100km MacLehose Trail. There are also lots of swimmable beaches dotted around the islands. Pollution can be bad in the built up areas, but I found it be no worse than Manhattan, say. Hong Kong is extremely safe, with a crime-rate that is lower than most large cities.

Is Hong Kong a good place to found your tech start-up? I’m certainly contemplating it: Taxes are low, it’s very livable and there’s robust government support for high-tech entrepreneurs.

Here is some recent related discussion:

Here are the startups we visited while in Hong Kong (thanks for having us!):

Randoom on the move

Right – after a few years on ITU servers, I’ve moved my blog to a separate domain hosted by Netplads. This was mostly for SEO reasons, so that I could build Google Juice on my own and not have my page rank muddled with whatever ITU does. The new host also allows .htaccess modifications so that I can get nice URLs. Netplads is a cheap and cheerful Danish hoster – the only fault I’ve found so far is a lack of mod_gzip support.

The blog theme has been modified quite a bit, but is still based on the venerable depo-clean theme by Derek Powazek. It has been cleaned up some more and now supports tags (as opposed to just categories). The theme relies on Smart Archives Reloaded to build the archives and features a ShareThis button. If you want, you can download my version of the theme.

On my old blog, the Redirection plugin does 301 redirects to the one you’re currently reading (doing rewrites in .htaccess would have been easier but was unsupported). In fact, it’s so good at it that I can no longer access my old blog in any way. Good riddance.

The other plugins enabled are:

… and with that, I’m off to Hong Kong.

Rent vs. Buy (or EC2 vs. building your own iron)

Over the past months Jeff Atwood (of Coding Horror fame) has been chronicling Stack Overflows quest for new hardware, starting with “Server Hosting – Rent vs. Buy?” and ending with some glamour shots. I’ve recently (along with others) built a setup for a .Net website in the same “to big for shared or low-end VPS hosting and (much) too small to have dedicated sysadmin staff” segment. We ended up going for Amazon EC2 so I thought I’d share our reasoning by comparing with the Stack Overflow setup.

UPDATE1: Atwood just gave another reason as to why EC2 may be  attractive.

UDATE2: Some of the gloomy projections in this post actually came through (for Stackoverflow, not for us): Tuesday Outage: It’s RAID-tastic!

First some notes on pricing: Mr. Atwood’s three servers costs him a total $6,000, on top of which comes rack space rent, bandwidth and licenses (where he gets off very cheaply by taking advantage of Microsoft’s BizSpark program). We rent two large EC2 instances, one of them with a SQL Server Standard license, for $1.6 pr. hour giving a total of $14,000 pr. year (on top of which comes bandwidth and Elastic Block Store usage). Mr. Atwood could buy all his gear (minus rack space) more than two times over for that money. And except for one important parameter, which I shall expand on later, his machines are much faster: The Database server has eight cores and 24GB of memory, while the Web servers have four cores and 8GB of memory. Our EC2 instances have to get by with just two cores and 7.5GB. An interesting aside is that exactly half the $1.6 goes to licenses (compared with getting non-windows large instances), most of it for SQL Server Standard.

Several commenters had some beefs with the disks in the new Stack Overflow database server and I agree they look rather dinky. The server has six 7200 RPM SATA drives in RAID1 and RAID10 arrays for OS/logs and data files respectively. While the drives are “Enterprise” branded, I hazard the guess that they are pretty much the same as desktop ones, except for a slightly higher MTBF promise and better warranty from the manufacturer. At any rate, 7200 RPM drives can only sustain about 125 random IOs pr. second, and because of the RAIDing, the IO-rate of the array is not six times that. On EC2 we have access to formidable Elastic Block Storage volumes, which are capable of sustaining upwards of 1000 IOPS. Should we need more oomph, the volumes can be soft-raided together until the 1GBps link from the EC2 instance to the EBS volume runs out of steam. (For completeness, I should note that the sequential IO performance of EBS volumes is not very good. That is irrelevant for most database workloads however, since users generally don’t have the good manners to request data in the order it is placed on disk). Mr. Atwood mentions that query execution time decreases nicely with CPU speed. This is obviously an important parameter when building a responsive web site, but I’d venture that query throughput volume is mostly related to disk performance and that we would have an edge here.

Another potential problem is the reliability of the drives, the longer warranty-period not withstanding. Let’s assume for a second (and I admit this is a pretty improbable scenario), that one of the drives in the Stack Overflow RAID10 array (holding SQL Server data files) copped it and went to the great disk-array in the sky. Mr. Atwood would probably get a notification of this, and immediately initiate a backup-operation to the good array (also holding OS and logs). Let’s also assume that at that very moment, the God of the datacenter decides to invoke Murphy’s law on the other disk in the mirror-set, killing it and taking the array and the database with it. Stack Overflow stops flowing, blog posts are written (I shall magnaminously refrain), F# buffs recurse indefinitely trying to post a question about why Stack Overflow is down but finding that Stack Overflow is down. Reddit and Slashdot are notified, further swamping the exasperated web servers – unable as they are to get anything out of the database. Mr. Atwood, in the meantime, is cheering on SQL Server Management studio to restore the latest backup as quickly as possible to the still-good array. He manages to bring the site up within a quarter of an hour, minus all activity since the last backup and running at a somewhat slower clip than usual. Having wiped the sweat from his brow, he still has to drive to the datacenter and swap the two bad drives (unless he trusts the datacenter dudes to do so), getting the usual datacenter tinnitus and a sniffle in the process.

If, on the other hand, the EBS volume holding our database were to die (an even more unlikely event), we would merely create a new volume, attach it to our database instance and restore from backup (conveniently located in nearby S3). Reaction time and data loss would be similar, but performance will not be degraded for any period. Also, I don’t have to plod out to some datacenter and fuss around with a server. Instead I can concentrate on adding new features to the site.

Some people stress the “Elastic” part of EC2, claiming that it is mostly relevant if your hardware requirements are extremely variable or you expect them to increase very rapidly. I think the flexibility it affords is relevant in more modest scenarios too though. Some examples: Need more IPs? Click of a button. Need a test server to try out a new version of your site? Click of a button. Need to increase the size of your database drive? Grab a snapshot and use it to create a bigger volume. Plus all the other features such as a CDN, secure backup in S3 and redundant datacenters that Amazon offers without large upfront costs.

EC2 is no panacea for sure and I agree with Mr. Atwood that poring over specs and reviews and putting together your own gear on the cheap is extremely rewarding. If you value your time and need flexibility though, it might be worth it to limit yourself to building desktop systems and use something like EC2 for hosting.

Older Posts Newer Posts