Speculation is growing that the estimated 7m Child Benefit records were contained in nothing more sophisticated than CSV (comma separated variable) files.
The known facts about the data: that it was not encrypted, had basic password protection, and its entire output fitted onto just two CDs (or was it DVD-Rs, there is a big capacity difference after all!?) all indicate that the data was exported into a word processing or spreadsheet file from another database.
Worse, the admission that the complete dataset, rather than just the NI records actually required by the NAO, was exported and copied because it would cost too much for the specific information contained in individual fields to be extracted from the original database, suggests that the data may be natively kept in an unstructured format within which it is difficult to run accurate searches.
To be fair, many companies still keep a lot of information in unstructured databases, just on a much smaller scale, and certainly not the names, addresses, dates of birth, NI numbers and bank details of up to 7m million people!
Whether this kind of shambolic, irresponsible database management policy is exclusive to the HMRC is hard to tell, but I'd like to stick up for another government department at least.
Many years ago I worked for the Home Office Immigration and Nationality Department (IND) where the records of non EU citizens residing in or visiting the country were kept in a fairly ancient, but huge mainframe database that at least had the benefit of being structured.
So, whenever some government minister or MP wanted to look clever by spouting out some relevant statistics in parliament to support whatever argument they were trying to promote or undermine (like how many Croats, Bosnians or Serbs were admitted on tourist visas during 1992 and never went home, for example - a fair few considering the war in Yugoslavia at the time) I was given the job of whipping up a quick 2G program that went into that vast information repository and came out with more or less exactly the results they wanted, without exporting any of the data contained in the fields I was not interested in.
The process wasn't especially quick and the programming language in use (some strange derivative of Cobol) was hardly intuitive, but old programs could be swiftly amended and compiled, then left running quietly in the background over a couple of hours whilst I did something else.
Even then, the actual results were provided very much on a need to know basis, even though they were not especially sensitive and were, in fact, published every three months.
At the time, I was naïve enough to think the entire civil service kept and searched on its records in a similar way – either I was wrong, or government IT in some respects has actually changed for the worse, rather than the better, over the past decade ...
Alterations in capillary blood flow can be caused by body position change
Curiosity rover is in 'normal mode' but not transmitting scientific data back to base
NatWest outage comes a day after Barclays' IT systems shut out customers and staff
The ICO is concerned with AggregateIQ's retention and processing of data used in the Brexit referendum