Michael Friis' Blog – Page 3 – A Michael Friis Production

ThinkPad W520: My (old) new computer

July 1, 2018

My home computer is a tricked-out 2011 ThinkPad W520. I find it to still be very fast and a joy to use. This post describes upgrades I’ve made to keep the machine relevant in 2018.

I got the W520 as my main work laptop in late 2011. In fact, you can see it in this 2012 AppHarbor housewarming invitation connected to two 23″ portrait-mode monitors. I spec’ed it with a quad-core/8-thread i7-2820QM 2.3/3.4GHz 8MB L3 processor, 8GB RAM, and two 320GB spinning disks configured for RAID 0 (You had to order with a RAID setup to get that option, and 320GB disks was the cheapest way to do that).

After taking delivery I switched out the disk drives for two 120GB Patriot Wildfire SSDs, also in RAID 0. The result was a 1GB/s read-write disk array which was good for 2011. At some later point I upgraded to 32GB RAM (4x8GB)—memory was much cheaper back then. I also added a cheap 120GB mSATA SSD for scratch storage (the mSATA slot is SATAII, so somewhat slower).

After leaving AppHarbor I used the W520 only sporadically for testing Windows Server pre-releases and running test VMs. Whenever I did, though, I found myself thinking “oh year, this is a really neat and fast machine, I should use it for something”. In 2017 I moved into condo and got to have a home office after 7 years of moving between assorted shared San Francisco houses and apartments. For my home PC, I wanted to see if I could make the W520 work.

My main requirement was a system that can power a 4K monitor running at 60hz. The W520 has a reasonably fast Nvidia discreet GPU and can render 4K just fine using remote desktop, but neither the VGA port nor the DisplayPort on the laptop can push out that many pixels to a plugged-in monitor.

Luckily the W520 has a bunch of expansion options that can potentially host a more modern graphics adapter:

2xUSB 3.0 ports
Internal mini-PCIe slot used for WiFi adapter (the Lenovo BIOS whitelists approved devices however, so a hacked BIOS is required to plug in anything fun)
ExpressCard 2.0 slot

ExpressCard technology was on the way out in 2011 but had reached its zenith and in the W520 offers a direct interface to the system PCI Express bus. This avoids the overhead and extra latency of the USB protocol. An “eGPU” ExpressCard to PCIe adapter is required to plug in a real graphics card and I got the $50 EXP GDC.

I settled on a Nvidia GTX 1050ti graphics card since it’s reasonably fast and power efficient. Note that a 220W Dell power brick is also required to power the PCIe dock and graphics card.

UPDATE: This script from a user on the eGPU.io forums fixed the “Error 43” incompatibility for me.

Recent Nvidia driver versions have introduced an incompatibility with eGPU setups and I spent some time troubleshooting “Error 43” before getting output on the external screen. I never got recent drivers to work, but version 375.70 from 2016 is stable for me—implausibly since it predates and is not supposed to support the 1050-series GPU. The old driver has proven to be a problem only when trying to play very recent games, and is not a blocker (but do get in touch if you happen to have gotten a setup like this working with the latest Nvidia drivers). I also tried a Radeon RX 560. While it didn’t require old drivers to work, it had all sorts of other display-artifact problems that I didn’t feel like troubleshooting.

The standard ZOTAC fans are loud and I replaced them with a single 80mm Noctua fan mounted with zip-ties. The fan connector is not the right one but can be jammed onto the graphics card fan header and wedged under the cooler. I removed the W520 keyboard and zip-tied another USB powered 92mm fan to the CPU cooler so that the (louder) built-in fan doesn’t have to spin up as frequently (not in picture below).

The final upgrade was two 512GB Samsung Evo 860 SSDs that replaced the old 120GB Patriot ones to get me 1TB of local storage.

The whole assemblage (gutted laptop festooned with adapters, the PCIe dock with graphics card, power bricks) is mounted on the wall under my desk. After much experimentation the components are now zip-tied to an IKEA SKÅDIS pegboard. The pegboard comes with stand-off wall mounts which allows some of the many cables to run behind the board. I put a small fridge magnet on the screen bezel where the “lid closed” sensor sits to keep the built-in LCD screen off.

I’m still very happy with the W520, and while I’m not sure getting the eGPU setup working was economical in terms of time invested (over just buying parts for a proper desktop), it was a fun project. To my amazement, it merrily runs slightly dated games like XCOM 2 and Homeworld 2 Remastered in full, glorious 4K.

Lenovo still supports the platform and recently released BIOS updates to address the Meltdown/Spectre exploits.

With the RAID 0 SSD disk system, the quad-core W520 still feels deadly fast and is a joy to use. It boots instantaneously and restoring a Chrome session with stupid many tabs (something I’m bad about) is quick. With 32GB of RAM I can run Docker for Windows and a handful of VMs without worry.

4K TV as PC monitor

June 20, 2018

I use a 49″ 4K TV as my computer monitor both at home and at work. TVs are generally much cheaper than computer monitors of the same size. For my use of general productivity, programming and very occasional gaming, a mid-size 4K TV beats getting a couple of regular monitors on both price and result.

How to take advantage of a 49″ monitor? Windows 10 has simple window tiling functionality (“Snap Assist”), and you can use Win + arrow keys to quickly arrange windows in a 2×2 equally-sized grid. On a 49″ screen, each of those 4 grid elements are the size of ~25″ screens, which is generous for browser windows or 2-pane text editors. This 4 window setup is what I mostly use when I’m being lazy.

If I’m working on the same thing for a while, I use WindowGrid to divvy up my screen to fit more apps. I really wish Windows had a way to customize the snap grid pattern, but WindowGrid is a good alternative. 3 horizontal by 2 vertical is perfect because you can have two centered apps, the three columns are plenty wide and it’s still quick to shuffle windows around a 6-pane grid.

Home: Samsung UN49KS8500 Curved 49″

I first got this 2016 curved Samsung 49″ TV to use with my home computer. I bought it used for $766. It’s the best monitor setup I’ve ever had (I’ve previously used and enjoyed dual 23″ IPS panels, 27″ Apple Cinema Display and 27″ Dell 4k monitor). I use it on a deep (32″) floating desk that I built in my office. The depth of the desk combined with the slight screen curve makes 49″ the perfect size. I don’t have to move my head much to look from one side of the screen to the other, and the screen fills up my field of view. The bezel is almost non-existent and the stand is attractive and doesn’t take up much desk-space. Text renders crisply, the colors are beautiful and the picture is calm at 60Hz.

Office: TCL 49S405 49″

When I re-joined Salesforce a couple of months ago, I asked to get a 27″ 4K monitor because I had liked that while working at Docker. Unfortunately the IT department decided I was not approved for such extravagance (that’s not a dig at Salesforce—in fact, you should come work with me). Since I was so happy with my home setup, I went ahead and approved myself for a $320 2017 vintage 49″ non-curved 4K TV which was more-or-less the cheapest 4k TV on Amazon at the time.

Receiving the giant box at the office was a surreal experience: It seems improbable that this much plastic, LCD, electronics, polystyrene and cardboard can be put together and schlepped to a 3rd floor San Francisco office so cheaply. I suspect TCL is an Elon-Musk-like organization, only instead of obsessively working backwards from the cost of kerosene and aircraft-grade aluminium to determine how cheaply one ton of satellite can be put into orbit, they do the same for plastic and liquid crystals to get cheap TVs into people’s living rooms.

With the TV assembled on my work desk, I realized that this setup was not going to be as awesome as my home configuration:

My work desk is narrower, so I sit closer to the screen
With no curve (and sitting so close) the sides of the screen begin to suffer from fading because of the oblique viewing angle
While not bad, the TV screen panel is just not as good as the Samsung unit

I’ve partly addressed the two first problems by building and mounting desk extenders, and after much calibration and fiddling, I managed to get an OK picture out of the TCL TV. Even so, I’d definitely recommend limiting yourself to 43″ unless you have a big desk and/or you can get a curved TV. I had in fact planned to get the 43″ TCL model, but it’s only $20 cheaper so I made the mistake of springing for 49″.

Summary

Overall I’d heartily recommend getting a 4K TV for use as a monitor if you have the space. I can’t think of a setup I’d prefer over my current home office TV: dual 27″ screens have a bezel-seam down the middle and I have way more screen real-estate than any of the 34″ wide-screen monitors available. 4K has so many pixels that, even when spread out over 49″ and sitting just 2-3′ away, graininess is not really an issue. Another advantage is that TVs come with passable speakers—I listen to music piped over HDMI from my computer at home, and it’s fine (if not exactly amazing).

49″ is the largest TV that makes sense in my experience, and depending on your desk and ergonomics, 43″ is probably a better choice. When choosing a TV model, be sure to get one that supports chroma 4:4:4 (my $320 one does). Otherwise the TV will sub-sample the image and text will look smudged.

One final word of caution: If you sit in an open office (like me), expect to spend on average 15 minutes every day explaining random passersby why there’s a TV on your desk.

ASP.NET 5 Docker language stack with Kestrel

November 16, 2014

This blog post presents a Docker Language Stack for creating and running ASP.NET 5 (née vNext) apps. It’s based on my work last week to run ASP.NET 5 on Google Container Engine.

I the interim, the ASP.NET team has released their own Docker image. It’s not really up to spec for being a Docker language stack though, so I forked it, added what was missing and published it on Docker Hub.

Other people already sent PRs to add onbuild support to the ASP.NET repo, but there’s apparently some uncertainty about how ASP.NET 5 apps are going to get built, so they’re holding off on merging. I hope that eventually the work presented here will get folded into the official repo, just like it happened with the Mono stack I created a month ago. That’s the base for what’s now the official Mono Docker language stack, which, incidentally, is what the ASP.NET docker image derives from!

How to use

Using the onbuild image is pretty simple. To run HelloWeb sample, clone that repo and add this Dockerfile in the HelloWeb dir, next to the project.json:

FROM friism/aspnet:1.0.0-beta1-onbuild
EXPOSE 5004

Now build the image:

docker build -t my-app .

And finally run the app, exposing the site on port 80 on your local machine:

docker run -t -p 80:5004 my-app

Note that the -t option is currently required when invoking docker run. This is because there’s some sort of bug in Kestrel that requires the process to have a functional tty to write to – without a tty, Kestrel hangs on start.

Google Container Engine for Dummies

November 11, 2014

Last week, Google launched an alpha version of a new product called Google Container Engine (GKE). It’s a service that runs pre-packaged Docker images for you: You tell GKE about images you want to run (typically ones you’ve put in the Docker Registry, although there’s a also a hack to run private images) and how many instances you need. GKE will spin them up and make sure the right number is running at any given time.

The GKE Getting Started guide is long and complicated and has more JSON than you shake a stick at. I suspect that’s because the product is still alpha, and I hope the Google guys will improve both the CLI and web UIs. Anyway, below is a simpler guide showing how to stand up a stateless web site with just one Docker image type. I’m also including some analysis at the end of this post.

I’m using a Mono/ASP.NET vNext Docker image, but all you need to know is that it’s an image that exposes port 5004 and serves HTTP requests on that port. There’s nothing significant about port 5004 – if you want to try with an image that uses a different port, simply substitute as appropriate.

In the interest of brevity, the description below skips over many details. If you want more depth, then remember that GKE is Kubernetes-as-a-Service and check out the Kubernetes documentation and design docs.

Setup

Go to the Google Developer Console and create a new project
For that project, head into the “APIs” panel and make sure you have the “Google Container Engine API” enabled
In the “Compute” menu section, select “Container Engine” and create yourself a new “Cluster”. A cluster size of 1 and a small instance is fine for testing. This guide assumes cluster name “test-cluster” and region “us-central1-a”.
Install the CLI and run gcloud config set project PROJECT_ID (PROJECT_ID is from step 1)

Running raw Pod

The simplest (and not recommended) way to get something up and running is to start a Pod and connect to it directly with HTTP. This is roughly equivalent to starting an AWS EC2 instance and connecting to its external IP.

First step is to create a JSON-file somewhere on your system, let’s call it pod.json:

{
  "id": "web",
  "kind": "Pod",
  "apiVersion": "v1beta1",
  "desiredState": {
    "manifest": {
      "version": "v1beta2",
      "containers": [
        {
          "name": "web",
          "image": "friism/aspnet-web-sample-web",
          "ports": [
            { "containerPort": 5004, "hostPort": 80 }
          ]
        }
      ]
    }
  },
  "labels": {
    "name": "web"
  }
}

What you should care about is the Docker image/repository getting run (friism/aspnet-web-sample-web) and the port mapping (the equivalent of docker run -p 80:5004). With that, we can tell GKE to start a pod for us:

$ gcloud preview container pods --cluster-name test-cluster --zone us-central1-a \
    create web --config-file=/path/to/pod.json
...
ID                  Image(s)                       Host                Labels              Status
----------          ----------                     ----------          ----------          ----------
web                 friism/aspnet-web-sample-web   <unassigned>        name=web            Waiting

All the stuff before “create” is boilerplate and the rest is saying that we’re requesting a pod named “web” as specified in the JSON file.

Pods take a while to get going, probably because the Docker image has to be downloaded from Docker Hub. While it’s starting (and after), you can SSH into the instance that’s running your pod to see how it’s doing, eg. by running sudo docker ps. This is the SSH incantation:

$ gcloud compute ssh --zone us-central1-a k8s-test-cluster-node-1

The instances are named k8s-<cluster-name>-node-1 and you can see them listed in the Web UI or with gcloud compute instances list. Wait for the pod to change status to “Running”:

$ gcloud preview container pods --cluster-name test-cluster --zone us-central1-a list
ID                  Image(s)                       Host                              Labels              Status
----------          ----------                     ----------                        ----------          ----------
web                 friism/aspnet-web-sample-web   k8s-<..>.internal/146.148.66.67   name=web            Running

The final step is to open up for HTTP traffic to the Pod. This setting is available in the Web UI for the instance (eg. k8s-test-cluster-node-1). Also check that the network settings for the instance allow for TCP traffic on port 80.

And with that, your site should be responding on the external ephemeral IP address of the host running the pod.

As mentioned in the introduction, this is not a production setup. The Kubernetes service running the pod will do process management and restart Docker containers that die for any reason (to test this, try ssh’ing into your instance and docker-kill the container that’s running your site – a new one will quickly pop up). But your site will go down in case there’s a problem with the pod, for example. Read on for details on how to extend the setup to cover that failure mode.

Adding Replication Controller and Service

In this section, we’re going to get rid of the pod-only setup above and replace with a replication controller and a service fronted by a loadbalancer. If you’ve been following along, delete the pod created above to start with a clean slate (you can also start with a fresh cluster).

First step is to create a replication controller. You tell a replication controller what and how many pods you want running, and the controller then tries to make sure the correct formation is running at any given time. Here’s controller.json for our simple use case:

{
  "id": "web",
  "kind": "ReplicationController",
  "apiVersion": "v1beta1",
  "desiredState": {
    "replicas": 1,
    "replicaSelector": {"name": "web"},
    "podTemplate": {
      "desiredState": {
         "manifest": {
           "version": "v1beta1",
           "id": "frontendController",
           "containers": [{
             "name": "web",
             "image": "friism/aspnet-web-sample-mvc",
             "ports": [{"containerPort": 5004, "hostPort": 80 }]
           }]
         }
       },
      "labels": { "name": "web" }
      }},
  "labels": {"name": "web"}
}

Notice how it’s similar to the pod configuration, except we’re specifying how many pod replicas the controller should try to have running. Create the controller:

$ gcloud preview container replicationcontrollers --cluster-name test-cluster \
    create --zone us-central1-a --config-file /path/to/controller.json
...
ID                  Image(s)                       Selector            Replicas
----------          ----------                     ----------          ----------
web                 friism/aspnet-web-sample-mvc   name=web            1

You can now query and see the controller spinning up the pods you requested. As above, this might take a while.

Now, let’s get a GKE service going. While individual pods come and go, services are permanent and define how pods of a specific kind can be accessed. Here’s service.json that’ll define how to access the pods that our controller is running:

{
  "id": "myapp",
  "selector": {
    "app": "web"
  },
  "containerPort": 80,
  "protocol": "TCP",
  "port": 80,
  "createExternalLoadBalancer": true
}

The important parts are selector which specifies that this service is about the pods labelled web above, and createExternalLoadBalancer which gets us a loadbalancer that we can use to access our site (instead of accessing the raw ephemeral node IP). Create the service:

$ gcloud preview container services --cluster-name test-cluster--zone us-central1-a create --config-file=/path/to/service.json
...
ID                  Labels              Selector            Port
----------          ----------          ----------          ----------
myapp                                   app=web             80

At this point, you can go find your loadbalancer IP in the Web UI, it’s under Compute Engine -> Network load balancing. To actually see my site, I still had to tick the “Enable HTTP traffic” boxes for the Compute Engine node running the pod – I’m unsure whether that’s a bug or me being impatient. The loadbalancer IP is permanent and you can safely create DNS records and such pointing to it.

That’s it! Our stateless web app is now running on Google Container Engine. I don’t think the default Bootstrap ASP.NET MVC template has ever been such a welcome sight.

Analysis

Google Container Engine is still in alpha, so one shouldn’t draw any conclusions about the end-product yet (also note that I work for Heroku and we’re in the same space). Below are a few notes though.

Google Container Engine is “Kubernetes-as-a-Service”, and Kubernetes is currently exposed without any filter. Kubernetes is designed based on Google’s experience running containers at scale, and it may be that Kubernetes is (or is going to be) the best way to do that. It also has a huge mental model however – just look at all the stuff we had to do to launch and run a simple stateless web app. And while the abstractions (pods, replication controllers, services) may make sense for the operator of a fleet of containers, I don’t think they map well to the mental model of a developer just wanting to run code or Docker containers.

Also, even with all the work we did above, we’re not actually left with a managed and resilient capital-S Service. What Google did for us when the cluster was created, was simply to spin up a set of machines running Kubernetes. It’s still on you to make sure Kubernetes is running smoothly on those machines. As an example, a GKE cluster currently only has one Master node. This is the Kubernetes control plane node that accepts API input and schedules pods on the GCE instances that are Kubernetes minions. As far as I can determine, if that node dies, then pods will no longer get scheduled and re-started on your cluster. I suspect Google will add options for more fault-tolerant setups in the future, but it’s going to be interesting to see what operator-responsibility the consumer of GKE will have to take on vs. what Google will operate for you as a Service.

Mono Docker language stack

October 5, 2014

I couple weeks ago, Docker announced official pre-built Docker images for a bunch of popular programming languages. Each stack generally consists of two Dockerfiles: a base Dockerfile that installs system dependencies required for that language to run, and an onbuild Dockerfile that uses ONBUILD instructions to transform app source code into a runnable Docker image. As an example of the latter, the Ruby onbuild Dockerfile runs bundle install to install libraries specified in an app’s Gemfile.

Managing system dependencies and composing apps from source code is very similar to what we do with Stacks and Buildpacks at Heroku. To better understand the Docker approach, I created a language stack for Mono, the open source implementation of Microsoft’s .NET Framework.

UPDATE: There’s now a proper official Docker/Mono language stack, I recommend using that.

How to use

A working Docker installation is required for this section.

To turn a .NET app into a runnable Docker image, first add a Dockerfile to your app source root. The sample below assumes a simple console app with an output executable name of TestingConsoleApp.exe:

FROM friism/mono:3.10.0-onbuild
CMD [ "mono", "./TestingConsoleApp.exe" ]

Now build the image:

docker build -t my-app .

The friism/mono images are available in the public Docker Registry and your Docker client will fetch them from there. Docker will then execute the onbuild instructions to restore NuGet packages required by the app and use xbuild (the Mono equivalent of msbuild) to compile source code into executables and libraries.

The Docker image with your app is now ready to run:

docker run my-app

If you don’t have an app to test with, you can experiment with this console test app.

Notes

The way Docker languages stacks are split into a base image (that declares system dependencies) and an onbuild Dockerfile (that composes the actual app to be run) is perfect. It allows each language to get just the system libraries and dependencies needed. In contrast, Heroku has only one stack image (in several versions, reflecting underlying Linux distribution versions) that all language buildpacks share. That stack is at once both too thick and too thin: It includes a broad assortment of libraries to make supported languages work, but most buildpack maintainers still have to hand-build dependencies and vendor in the binaries when apps are built.

Docker has no notion of a cache for ONBUILD commands whereas the Heroku buildpack API has a cache interface. No caching makes the Docker stack maintainer’s life easier, but it also makes builds much slower than what’s possible on Heroku. For example, Heroku buildpacks can cache the result of running bundle install (in the case of Ruby) or nuget restore (for Mono), greatly speeding up builds after the first one.

Versioning is another interesting difference. Heroku buildpacks bake support for all supported language versions into a single monolithic release. What language version to use is generally specified by the app being built in a requirements.txt (or similar) file and the buildpack uses that to install the correct packages.

Docker language stacks, on the other hand, support versioning with version tags. The app chooses what stack version to use with the FROM instruction in the Dockerfile that’s added to the app. Stack versions map to versions of the underlying language or framework (eg. FROM python:3-onbuild gets you Python 3). This approach lets the Python stack, for example, compile Python 2 and 3 apps in different ways without having a bunch of branching logic in the onbuild Dockerfile. On the other hand, pushing an update to all Python stack versions becomes more work because the tags have to be updated individually. There are tradeoffs in both the Docker and Heroku buildpack approaches, I don’t know which is best.

Docker maintains a free, automated build service that churns out hosted Docker images for everyone to use. For my Mono stack, Docker Hub pulls updates from the GitHub repo with the Dockerfiles and builds the relevant tags into images. This is very convenient for stack maintainers. Heroku has no hosted service for building buildpack binaries, although I have documented a (Docker-based) approach to scripting this work.

(Note that, while Heroku buildpacks are wildly successful, it’s an older standard that predates Docker by many years. If it seems like Docker has gotten more things right, it’s probably because that project was informed by Heroku’s experience and by the passage of time).

Finally, and unrelated to Docker and Heroku, the Mono Project now has an APT package repository. This is pretty awesome, and I sincerely hope that the days of having to compile Mono from source are behind us. I don’t know if the repo is quite stable yet (I had to download a key without using SSL, the mono-devel package is versioned 3.10.0-0xamarin1 and the package fails to declare a dependency on udev), but it made the Mono Docker stack image a lot simpler. Check out the diff going from 3.8.0 (compiled from source) to 3.10.0 (installed from APT repo).

WebP, content negotiation and CloudFront

June 30, 2014

AWS recently launched improvements to the CloudFront CDN service. The most important change is the option to specify more HTTP headers be part of the cache key for cached content. This lets you run CloudFront in front of apps that do sophisticated Content Negotiation. In this post, we demonstrate how to use this new power to selectively serve WebP or JPEG encoded images through CloudFront depending on client support. It is a follow-up to my previous post on Serving WebP images with ASP.NET. Combining WebP and CloudFront CDN makes your app faster because users with supported browsers get highly compressed WebP images from AWS’ fast and geo-distributed CDN.

The objective of WebP/JPEG content negotiation is to send WebP encoded images to newer browsers that support it, while falling back to JPEG for older browser without WebP support. Browsers communicate WebP support through the Accept HTTP header. If the Accept header value for a request includes image/webp, we can send WebP encoded images in responses.

Prior to the CloudFront improvements, doing WebP/JPEG content negotiation for images fronted with CloudFront CDN was not possible. That’s because CloudFront would only vary response content based on the Accept-Encoding header (and some other unrelated properties). With the new improvements, we can configure arbitrary headers for CloudFront to cache on. For WebP/JPEG content negotiation, we specify Accept in list list of headers to whitelist in the AWS CloudFront console.

Specifying Accept header in AWS console — CloudFront configuration

Once the rest of the origin and behavior configuration is complete, we can use cURL to test that CloudFront correctly caches and serves responses based on the Accept request header:

curl -v http://abcd1234.cloudfront.net/myimageurl > /dev/null
...
< HTTP/1.1 200 OK
< Content-Type: image/jpeg
...

And simulating a client that supports WebP:

curl -v -H "Accept: image/webp" http://abcd1234.cloudfront.net/myimageurl > /dev/null
...
< HTTP/1.1 200 OK
< Content-Type: image/webp
...

Your origin server should set the Vary header to let any caches in the response-path know how they can cache, although I don’t believe it’s required to make CloudFront work. In our case, the correct Vary header value is Accept,Accept-Encoding.

That’s it! Site visitors will now be receiving compact WebP images (if their browsers support them), served by AWS CloudFront CDN.

Building Heroku buildpack binaries with Docker

February 24, 2014

This post covers how binaries are created for the Heroku Mono buildpack, in particular the Mono runtime and XSP/fastcgi-mono-server components that are vendored into app slugs. Binaries were previously built using a custom buildpack with the build running in a Heroku dyno. That took a while though and was error-prone and hard to debug, so I have migrated the build scripts to run inside containers set up with Docker. Some advantages of this approach are:

It’s fast, builds can run on my powerful laptop
It’s easy to get clean Ubuntu Lucid (10.04) image for reproducible clean-slate builds
Debugging is easy too, using an interactive Docker session

The two projects used to build Mono and XSP are on GitHub. Building is a two-step process: First, we do a one-time Docker build to create an image with the bare-minimum (eg. curl, gcc, make) required to perform Mono/XSP builds. s3gof3r is also added – it’s used a the end of the second step to upload the finished binaries to S3 where buildpacks can access them.

The finished Docker images have build scripts and can now be used to run builds at will. Since changes are not persisted to the Docker images each new build happens from a clean slate. This greatly improves consistency and reproducibility. The general phases in the second build step are:

Get source code to build
Run configure or autogen.sh
Run make and make install
Package up result and upload to S3

Notice that the build scripts specify the /app filesystem location as the installation location. This is the filesystem location where slugs are mounted in Heroku dynos (you can check yourself by running heroku run pwd). This may or not be a problem for your buildpack, but Mono framework builds are not path-relative, and expect to eventually be run out of where they were configured to be installed. So during the build, we have to use a configure setting consistent with how Heroku runs slugs.

The build scripts are parametrized to take AWS credentials (for the S3 upload) and the version to build. Building Mono 3.2.8 is done like this:

$ docker run -v ${PWD}/cache:/var/cache -e AWS_ACCESS_KEY_ID=key -e AWS_SECRET_ACCESS_KEY=secret -e VERSION=3.2.8 friism/mono-builder

It’s now trivial to quickly create and upload to S3 binaries for all the versions of Mono I want to support. It’s very easy to experiment with different settings and get binaries that are small (so slugs end up being smaller), fast and free of bugs.

There’s one trick in the docker run command: -v ${PWD}/cache:/var/cache. This mounts the cache folder from the current host directory in the container. The build scripts use that location to store and cache the different versions of downloaded source code. With the source code cache, turnaround times when tweaking build settings are even shorter because I don’t have to wait for the source to re-download on each run.

Running .NET apps on Docker

February 19, 2014

This blog post covers running simple .NET apps in Docker lightweight containers using Mono. I run Docker in a Vagrant/VirtualBox VM on Windows. This works great and is fast. Installation instructions are available on the Docker site.

Building the base image

First order of business is to create a Docker image that has Mono installed. We will use this as the base image for containers that actually run apps. To get the most recent Mono version (3.2.6 at the time of writing) I use packages created by Timotheus Pokorra installed on a Ubuntu 12.04 LTS Docker image. Here’s the Dockerfile for that:

FROM ubuntu:12.04
MAINTAINER friism

RUN apt-get -y -q install wget
RUN wget -q http://download.opensuse.org/repositories/home:tpokorra:mono/xUbuntu_12.04/Release.key -O- | apt-key add -
RUN apt-get remove -y --auto-remove wget
RUN sh -c "echo 'deb http://download.opensuse.org/repositories/home:/tpokorra:/mono/xUbuntu_12.04/ /' >> /etc/apt/sources.list.d/mono-opt.list"
RUN apt-get -q update
RUN apt-get -y -q install mono-opt

Here’s what’s going on:

Install wget
Add repository key to apt-get
remove wget
Add openSUSE repository to sources list
Install Mono from there

At first I did all of the above in one command since these steps represent the single logical step of installing Mono and it seems like they should be just one commit. Nobody will be interested in the commit after wget was installed, for example. I ended up splitting it up into separate RUN commands since that’s what other people seem to do.
With that Dockerfile, we can build an image:

$ docker build -t friism/mono .

We can then run a container using the generated image and check our Mono installation:

vagrant@precise64:~/mono$ docker run -i -t friism/mono bash
root@0bdca65e6e8e:/# /opt/mono/bin/mono --version
Mono JIT compiler version 3.2.6 (tarball Sat Jan 18 16:48:05 UTC 2014)

Note that Mono is installed in /opt and that it works!

Running a console app

First, we’ll deploy a very simple console app:

using System;

namespace HelloWorld
{
	public class Program
	{
		static void Main(string[] args)
		{
			Console.WriteLine("Hello World");
		}
	}
}

For the purpose of this example, we’ll pre-build apps using the VS command prompt and msbuild and then add the output to the container:

msbuild /property:OutDir=C:\tmp\helloworld HelloWorld.sln

Alternatively, we could have invoked xbuild or gmcs from within the container.

The Dockerfile for the container to run the app is extremely simple:

FROM friism/mono
MAINTAINER friism

ADD app/ .
CMD /opt/mono/bin/mono `ls *.exe | head -1`

Note that it relies on the friism/mono image created above. It also expects the compiled app to be in the /app folder, so:

$ ls app
HelloWorld.exe

The CMD will simply use mono to run the first executable found in the build output. Let’s build and run it:

$ docker build -t friism/helloworld-container .
...
$ docker run friism/helloworld-container
Hello World

It worked!

Web app

Running a self-hosting OWIN web app is only slightly more work. For this example I used the sample code from the OWIN/Katana-on-Heroku post.

$ ls app/
HelloWorldWeb.exe  Microsoft.Owin.Diagnostics.dll  Microsoft.Owin.dll  Microsoft.Owin.Host.HttpListener.dll  Microsoft.Owin.Hosting.dll  Owin.dll

The Dockerfile for this exposes port 5000 from the container, and CMD is used to start the web app and specify port 5000 to listen on:

FROM friism/mono
MAINTAINER friism

ADD app/ .
EXPOSE 5000
CMD ["/opt/mono/bin/mono", "HelloWorldWeb.exe", "5000"]

We start the container and map port 5000 to port 80 on the machine running Docker:

$ docker run -p 80:5000 -t friism/mono-hello-world-web

And with that we can visit the OWIN sample site on http://localhost/.

If you’re a .NET developer this post will hopefully have helped you place Docker in the context of your everyday work. If you’re interested in more ideas on how to use Docker to deploy apps, check out this Automated deployment with Docker – lessons learnt post.

Serving WebP images with ASP.NET MVC

January 16, 2014

Speed is a feature and one thing that can slow down web apps is clients waiting to download images. To speed up image downloads and conserve bandwidth, the good folks of Google have come up with a new image format called WebP (“weppy”). WebP images are around 25% smaller in size than equivalent images encoded with JPEG and PNG (WebP supports both lossy and lossless compression) with no worse perceived quality.

This blog post shows how to dynamically serve WebP-encoded images from ASP.NET to clients that support the new format.

One not-so-great way of doing this is to serve different HTML depending on whether clients supports WebP or not, as described in this blog post. As an example, clients supporting WebP would get HTML with <img src="image.webp"/> while other clients would get <img src="image.jpeg"/>. The reason this sucks is that the same HTML cannot be served to all clients, making caching harder. It will also tend to pollute your view code with concerns about what image formats are supported by browser we’re rendering for right now.

Instead, images in our solution will only ever have one url and the content-type of responses depend on the capabilities of the client sending the request: Browsers that support WebP get image/webp and the rest get image/jpeg.

I’ll first go through creating WebP-encoded images in C#, then tackle the challenge of detecting browser image support and round out the post by discussing implications for CDN use.

Serving WebP with ASP.NET MVC

For the purposes of this article, we’ll assume that we want to serve images from an URI like /images/:id where :id is some unique id of the image requested. The id can be used to fetch a canonical encoding of the image, either from a file system, a database or some other backing store. In the code I wrote to use this, the images are stored in a database. Once fetched from the backing store, the image is re-sized as desired, re-encoded and served to the client.

At this point, some readers are probably in uproar: “Doing on-the-fly image fetching and manipulation is wasteful and slow” they scream. That’s not really the case though, and even if it were, the results can be cached on first request and then served quickly.

Assume we have an Image class and method GetImage(int id) to retrieve images:

private class Image
{
	public int Id { get; set; }
	public DateTime UpdateAt { get; set; }
	public byte[] ImageBytes { get; set; }
}

We’ll now use the managed API from ImageResizer to resize the image to the desired size and then re-encode the result to WebP using Noesis.Drawing.Imaging.WebP.dll (no NuGet package, unfortunately).

public ActionResult Show(int imageId)
{
	var image = GetImage(imageId);

	var resizedImageStream = new MemoryStream();
	ImageBuilder.Current.Build(image.ImageBytes, resizedImageStream, new ResizeSettings
	{
		Width = 500,
		Height = 500,
		Mode = FitMode.Crop,
		Anchor = System.Drawing.ContentAlignment.MiddleCenter,
		Scale = ScaleMode.Both,
	});

	var resultStream = new MemoryStream();
	WebPFormat.SaveToStream(resultStream, new SD.Bitmap(resizedImageStream));
	resultStream.Seek(0, SeekOrigin.Begin);

	return new FileContentResult(resultStream.ToArray(), "image/webp");
}

System.Drawing is referenced using using SD = System.Drawing;. The controller action above is fully functional and can serve up sparkling new WebP-formatted images.

Browser support

Next up is figuring out whether the browser requesting an image actually supports WebP, and if it doesn’t, respond with JPEG. Luckily, this doesn’t involve going back to the bad old days of user-agent sniffing. Modern browsers that support WebP (such as Chrome and Opera) send image/webp in the accept header to indicate support. Ironically given that Google came up with WebP, the Chrome developers took a lot of convincing to set that header in requests, fearing request size bloat. Even now, Chrome only advertises webp support for requests that it thinks is for images. In fact, this is another reason the “different-HTML” approach mentioned in the intro won’t work: Chrome doesn’t advertise WebP support for requests for HTML.

To determine what content encoding to use, we inspect Request.AcceptTypes. The resizing code is unchanged, while the response is generated like this:

	var resultStream = new MemoryStream();
	var webPSupported = Request.AcceptTypes.Contains("image/webp");
	if (webPSupported)
	{
		WebPFormat.SaveToStream(resultStream, new SD.Bitmap(resizedImageStream));
	}
	else
	{
		new SD.Bitmap(resizedImageStream).Save(resultStream, ImageFormat.Jpeg);
	}

	resultStream.Seek(0, SeekOrigin.Begin);
	return new FileContentResult(resultStream.ToArray(), webPSupported ? "image/webp" : "image/jpeg");

That’s it! We now have a functional controller action that responds correctly depending on request accept headers. You gotta love HTTP. You can read more about content negotiation and WebP on lya Grigorik’s blog.

Client Caching and CDNs

Since it does take a little while to perform the resizing and encoding, I recommend storing the output of the transformation in HttpRuntime.Cache and fetching from there in subsequent requests. The details are trivial and omitted from this post.

There is also a bunch of ASP.NET cache configuration we should do to let clients cache images locally:

	Response.Cache.SetExpires(DateTime.Now.AddDays(365));
	Response.Cache.SetCacheability(HttpCacheability.Public);
	Response.Cache.SetMaxAge(TimeSpan.FromDays(365));
	Response.Cache.SetSlidingExpiration(true);
	Response.Cache.SetOmitVaryStar(true);
	Response.Headers.Set("Vary",
		string.Join(",", new string[] { "Accept", "Accept-Encoding" } ));
	Response.Cache.SetLastModified(image.UpdatedAt.ToLocalTime());

Notice that we set the Vary header value to include “Accept” (as well as “Accept-Encoding”). This tells CDNs and other intermediary caching proxies that if they try to cache this response, they must vary the cached value based on value of the “Accept” header of the request. This works for “Accept-Encoding”, so that different values can cached based on whether the response is compressed with gzip, deflate or not at all, and all major CDNs support it. Unfortunately, the mainstream CDNs I experimented with (CloudFront and Azure CDN) don’t support other Vary values than “Accept-Encoding”. This is really frustrating, but also somewhat understandable from the standpoint of the CDN folks: If all Vary values are honored, the number of artifacts they have to cache would increase at a combinatorial rate as browsers and servers make use of cleverer caching. Unless you find a CDN that specifically support non-Accept-Encoding Vary values, don’t use a CDN when doing this kind of content negotiation.

That’s it! I hope this post will help you build ASP.NET web apps that serve up WebP images really quickly.

Full 2012 Danish company taxes

December 19, 2013

Two weeks ago I put out a preliminary release of the 2012 taxes. The full dataset with 245,836 companies is now available in this Google Fusion Table. I haven’t done any of the analysis I did last year. Other than a list of top payers, I’ll leave the rest up to you guys. One area of interest might be to compare the new data with the data from last year: What new companies have showed up that have lots of profit or pay lots of tax? What high-paying companies went away or stopped paying anything? Who’s taxes and profits changed the most? You can find last year’s data via the blog post on the 2011 data.

Here are the top 2012 corporate tax payers:

Novo A/S: kr. 3,747,186,913.00
KIRKBI A/S: kr. 2,044,585,056.00
NORDEA BANK DANMARK A/S: kr. 1,669,087,041.00
TDC A/S: kr. 1,587,748,550.00
Coloplast A/S: kr. 607,238,513.00

Ping me on Twitter if you build something cool with this data. The tax guys were kind enough to say that they’re looking into my finding that the 2011 data had been changed and address my critique of the fact that the 2011 data is no longer available. I’m still hoping for a response.

How to read the 2012 data

Each company record exposes up to 9 values. I’ve given them English column names in the Fusion Table. As an example of what goes where, check out A/S Dansk Shell/Eksportvirksomhed:

Profit (“Skattepligtig indkomst for 2012”): kr. 528.440.304
Losses (“Underskud, der er trukket fra indkomsten”): kr. 225.655.303
Tax (“Selskabsskatten for 2012”): kr. 132.110.075
FossilProfit (“Skattepligtig kulbrinteindkomst for 2012”): kr. 9.757.362.931
FossilLosses (“Underskud, der er trukket fra kulbrinteindkomsten”): kr. 0
FossilTax (“Kulbrinteskatten for 2012”): kr. 5.073.828.724
FossilCorporateProfit (“Skattepligtig selskabsindkomst, jf. kulbrinteskatteloven for 2012”): kr. 14.438.589.854
FossilCorporateLosses (“Underskud, der er trukket fra kulbrinteindkomsten”): no value
FossilCorporateTax (“Selskabsskat 2012 af kulbrinteindkomst”): kr. 3.609.647.450

A better way to release tax data

Computerworld has a interesting article with details from the process of making the 2011 data available. The article is mostly about the fact that various Danish business lobbies managed to convince the tax authority to bundle fossil extraction taxes and taxes on profit into one number. This had the effect of cleverly obfuscating the fact that Maersk doesn’t seem to pay very much tax on profits earned in the parts of their business that aren’t about extracting oil.

More interesting for us, the article also mentions that the tax authority spent about kr. 1 mio to develop the site that makes the data available. That seems generous for a single web page that can look up values in a database, and does so slowly. But it’s probably par for the course in public procurement.

The frustrating part is that the data could be released in a much more simple and useful way: Simply dump it out in an Excel spreadsheet (and a .csv for people that don’t want to use Excel) and make it available for download. I have created these files from the data I’ve scraped, and they’re about 27MB (5MB compressed). It would be trivial and cheap to make the data set available in this format.

Right now, there’s probably a good number of programmers and journalists (me included) that have build scrapers and are hammering the tax website to get all the data. With downloadable spreadsheets, getting all the data would be fast and easy, and it would save a lot of wasted effort. I’m sure employees at the Tax Authority create spreadsheets like this all the time anyway, to do internal analysis and reporting.

Depending on whether data-changes (as seen with last years data) are a fluke or normal, it might be necessary to release new spreadsheets when the data is updated. This would be great too though, because it would make the tracking changes easier and more transparent.

Older Posts Newer Posts