v3-labs

a blog from

Child Benefit records kept in CSV format!!?

  • Tweet this

Speculation is growing that the estimated 7m Child Benefit records were contained in nothing more sophisticated than CSV (comma separated variable) files.

The known facts about the data: that it was not encrypted, had basic password protection, and its entire output fitted onto just two CDs (or was it DVD-Rs, there is a big capacity difference after all!?) all indicate that the data was exported into a word processing or spreadsheet file from another database.

Worse, the admission that the complete dataset, rather than just the NI records actually required by the NAO, was exported and copied because it would cost too much for the specific information contained in individual fields to be extracted from the original database, suggests that the data may be natively kept in an unstructured format within which it is difficult to run accurate searches.

To be fair, many companies still keep a lot of information in unstructured databases, just on a much smaller scale, and certainly not the names, addresses, dates of birth, NI numbers and bank details of up to 7m million people!

Whether this kind of shambolic, irresponsible database management policy is exclusive to the HMRC is hard to tell, but I'd like to stick up for another government department at least.

Many years ago I worked for the Home Office Immigration and Nationality Department (IND) where the records of non EU citizens residing in or visiting the country were kept in a fairly ancient, but huge mainframe database that at least had the benefit of being structured.

So, whenever some government minister or MP wanted to look clever by spouting out some relevant statistics in parliament to support whatever argument they were trying to promote or undermine (like how many Croats, Bosnians or Serbs were admitted on tourist visas during 1992 and never went home, for example - a fair few considering the war in Yugoslavia at the time) I was given the job of whipping up a quick 2G program that went into that vast information repository and came out with more or less exactly the results they wanted, without exporting any of the data contained in the fields I was not interested in.

The process wasn't especially quick and the programming language in use (some strange derivative of Cobol) was hardly intuitive, but old programs could be swiftly amended and compiled, then left running quietly in the background over a couple of hours whilst I did something else.

Even then, the actual results were provided very much on a need to know basis, even though they were not especially sensitive and were, in fact, published every three months.

At the time, I was naïve enough to think the entire civil service kept and searched on its records in a similar way – either I was wrong, or government IT in some respects has actually changed for the worse, rather than the better, over the past decade ...

22 Nov 2007

Do you agree?

 

Add your comment

We won't publish your address
By submitting a comment you agree to abide by our Terms & Conditions. Your comment will be moderated before publication.
To send to more than one email address, simply separate each address with a comma.