Getting ready for 2012 Danish company taxes

Posted on | | Leave a Comment on Getting ready for 2012 Danish company taxes

This is a follow-up to last year’s “Tax records for Danish companies” post which covered how I screen-scraped and analyzed 2011 tax records for all Danish companies.

I revisited the scraper source code today because the Tax Authority has made it known[dk] that they will be releasing the 2012 data set next week. As I did last year, I want to preface the critique that is going to follow and say that it’s awesome that this information is made public, and that I hope the government will continue to publish it and work on making it more useful.

First some notes regarding the article:

  • It states that, for the first time, it’s possible to determine what part of a company’s taxes are due to profits on oil and gas extraction and what are due to normal profits. That’s strange, since this was also evident from last years data. Maybe they’re trying to say (but the journalist was too dim to understand) that they have solved the problem in the 2011 data that caused oil and gas corporations to be duplicated, as evidenced by the two entries for Maersk: A.P.Møller – Mærsk A/S/ Oil & Gas Activity and A.P. MØLLER – MÆRSK A/S. Note that the two entries have the same CVR identifier.
  • It’s frustrating that announcements like this (that new data is coming next week) are not communicated publicly on Twitter or the web sites of either the Tax Authority or the Ministry of Taxation. Instead, one has to randomly find the news on some random newspaper web site. Maybe it was mentioned in a newsletter I’m not subscribed to – who knows.

Anyway, these are nuisances, now on to the actual problems.

2011 data is going away

The webpage says it beautifully:

De offentliggjorte skatteoplysninger er for indkomståret 2011 og kan ses, indtil oplysningerne for 2012 bliver offentliggjort i slutningen af 2013.

Translated:

The published tax information is for the year 2011 and is available until the 2012 information is published at the end of 2013.

Removing all the 2011 data to make room for the 2012 stuff is very wrong. First off, it’s defective that historical information is not available. Of course, I scraped the information and put it in a Fusion Table for posterity (or at least for as long as Google maintains that product). Even then, it’s wrong of the tax authority to not also publish and maintain historical records.

Second, I suspect that the new 2012 data will be published using the same URI scheme as the 2011 data, i.e.: http://skat.dk/SKAT.aspx?oId=skattelister&x={cvr-id}. So when the new data goes live some time next week, a URI that pointed to the 2011 tax records of the company FORLAGET SOHN ApS will all of a sudden point to the 2012 tax records of that company. That means that all the links I included in last year’s blog post and thought would point to 2011 data in perpetuity now point to 2012 data. This is likely going to be confusing to readers, both of my post, but also for other people following those links from all over the Internet. The semantics of these URIs are arguably coherent if they’re defined to be “the latest tax records for company X”. This is not a very satisfying paradigm though, and it would be much better if /company-tax-records/{year}/{cvr-id} URIs were made available, or if records from all years were available at /SKAT.aspx?oId=skattelister&x={cvr-id} as they became available.

The 2011 data was changed

I discovered this randomly when dusting off last years code. It has a set of integration tests, and the one for Saxo Bank refused to pass. That turns out to be because the numbers reported have changed. When I first scraped the data, Saxo Bank paid kr. 25.426.135 in taxes on profits of kr. 257.969.357. The current numbers are kr. 25.142.333 taxes on kr. 260.131.946 of profits. So it looks like the bank made a cool extra couple millions in 2011 and managed to get their tax bill bumped down a bit.

Some takeaways:

  1. Even though this information is posted almost a full year after the end of 2011, the numbers are not accurate and have to be corrected. This is obviously not only the tax authority’s fault: Companies are given ample time to gather and submit records and even then, they may provide erroneous data.
  2. It’d be interesting to know what caused the correction. Did Saxo Bank not submit everything? Did the tax people miss something? Was Saxo Bank audited?
  3. It’d be nice if these revisions themselves were published in an organised fashion by the tax authorities. Given the ramshackle way they go about publishing the other data, I’m not holding my breath for this to happen.
  4. I have no idea if there are adjustments to other companies and if so, how many. I could try and re-run the scraper on all 243,711 companies to find changes before the 2012 release obliterates the 2011 data but I frankly can’t be bothered. Maybe some journalist can go ask.

That’s it! Provided the tax people don’t change the web interface, the scraper is ready for next week’s 2012 data release. I’ll start running as soon as 2012 numbers show up and publish raw data when I have it.