{"id":28270,"date":"2016-08-22T14:30:03","date_gmt":"2016-08-22T04:30:03","guid":{"rendered":"http:\/\/www.aspistrategist.ru\/?p=28270"},"modified":"2016-08-22T11:50:41","modified_gmt":"2016-08-22T01:50:41","slug":"geeks-week-trawling-defence-contracts-database","status":"publish","type":"post","link":"https:\/\/www.aspistrategist.ru\/geeks-week-trawling-defence-contracts-database\/","title":{"rendered":"Geek(s) of the week: trawling the Defence contracts database"},"content":{"rendered":"
Way back when, we started our professional lives in theoretical physics. And sometimes it shows; we\u2019ve never met a data set we didn\u2019t like, and from time to time there\u2019s an outbreak of ‘let\u2019s see what we can do with this’.<\/p>\n
So it was with the database of more than a quarter of a million contracts let by Defence or the then defence Materiel Organisation between 2007 and 2014. As well as being a useful source of information about expenditure on particular items or projects\u2014thus offsetting to some extent the lamentable lack of transparency that\u2019s otherwise the case from Defence\u2014there\u2019s enough data there to conduct some macro analysis.<\/p>\n
With such a large data set, we can check that the overall statistical properties are what we\u2019d expect. If we were to find any anomalies, we\u2019d have an interesting line of investigation regarding Defence\u2019s contracting practices. As it happens, we didn\u2019t turn up anything untoward, which is reassuring, but we did manage to demonstrate a couple of interesting mathematical properties of the data set. (You can take the boys out of the physics department\u2026)<\/p>\n
The first observation concerns the distribution of contracts by value. It\u2019s not surprising that there are fewer contracts as the value increases, but it is surprising just how predictable the numbers are. The graph below plots the number of DMO projects between $10 million and $100 million (in bands of $10 million) over the period 2007 to 2014. All nine data points are remarkably well fitted by a simple Power Law<\/a> function with only two parameters. That means that knowing any two values on the curve allows us to predict any other. For example, if we know how many contracts were let between $10\u201320 million and $20\u201330 million, we can predict the number between $70\u201380 million with high confidence. (For the statistically minded, the regression value is 0.9824.)<\/p>\n <\/a><\/p>\n The obvious question is why? It turns out that this is a general property of large data sets with a couple of not-so-unusual properties and Power Law distributions are common in science and economics. In the late 19th century and early 20th centuries, Power Law distributions were observed in income distributions (Pareto<\/a>), word frequencies (Zipf\u2019s law<\/a>) and the distribution of population size in cities (another Zipf\u2019s law<\/a>, explained in a technical paper here<\/a> (PDF). Journal citations, book sales, earthquake magnitudes, company size, stock market movements, web hits, individual net worth, executive remuneration and even the diameter of craters on the moon<\/a> (PDF) follow the same pattern.<\/p>\n It\u2019s not entirely clear (at least to us) what aspects of Defence contracting cause it to fit the power law model\u2014though it\u2019s possible that the distribution of contract size mirrors the distribution of the size of firms that are counterparties to the contracts, which is known to follow a power law<\/a> (PDF). (You can find a more thorough discussion in chapter nine of this year\u2019s ASPI budget brief<\/a>.)<\/p>\n Regardless of the reason, the distribution does what it ought to, and it also mirrors the distribution of contracts let by the US government. Presenting the data in a slightly different way, the graph below shows the value of the top 100 contracts let by the US government and by DMO, ordered by contract size. The scale is different (the largest American contract is US$36.3 billion to Lockheed Martin, while Australia\u2019s largest is a little over AU$4 billion as part of the air warfare destroyer project) but both follow a power law with more than 97% correlation.<\/p>\n <\/a><\/p>\n One other property we can check is a quirky aspect of many data sets known as Benford\u2019s Law<\/a>, which concerns the leading digits of entries in the data. The law says that the leading digit is more likely to be a \u20181\u2019 than any other digit, that \u20182\u2019 is the next most likely, and so on, in a predictable relative frequency. It\u2019s not intuitively obvious, but it\u2019s a common enough pattern that it can even be used for forensic accounting<\/a> (recommended video); people fiddling the books tend to distribute their dodgy numbers in a non-Benford pattern. (And if the first digit doesn\u2019t betray them, the second or third will.)<\/p>\n The graph below shows how DMO\u2019s contracts stack up against Benford\u2019s Law. It’s not an exact match, but the relative frequency falls off progressively by digit as expected. The difference is probably due to threshold effects in the contracting and approvals system\u2014a glance at the raw data shows a sudden leap in the number of contracts at $10,000, for example.<\/p>\n <\/a><\/p>\n Source: ASPI analysis of AusTender database.<\/p>\n