UK Covid-19 data underreported due to file size limit exceeded
134 points by slashdotdash 4 years ago | 22 comments- mellosouls 4 years agoNote: for all the people reckoning this is an Excel issue as reported on Twitter:
a) The Twitter link is referencing the Daily Mail. https://www.dailymail.co.uk/news/article-8805697/Furious-bla...
b) The Mail does not source it's claim.
c) All those Excel references in the news seem to postdate (at time of writing) speculation on Excel in IT-heavy forums like this.
While I can believe the Excel conjecture is correct, I wish people would stop referencing it like it's a proven fact, or provide an authoritative source for the Excel claim.
I hate to be that guy, but...
- mellosouls 4 years agoBBC say they have 'confirmed' it's Excel, though still a bit mysterious on how they've done that: https://www.bbc.co.uk/news/uk-54422505
Particularly, they reckon they were using the older XLS rather than XLSX format with the lower 65000 row limit - although that seems more like a deflection from the fact functionality limits weren't properly handled, tested or designed for.
- dang 4 years agoThat article seems both more informative and more up-to-date, so we'll merge most of the comments from this thread into https://news.ycombinator.com/item?id=24689247. Thanks!
- dang 4 years ago
- zelos 4 years agoI hope you're right, but most reports do at least mention "exceeding maximum file size". That seems like a fairly basic error: what reasonable (national-scale) data storage formats would have an arbitrary file size limit?
- josephpmay 4 years agoWe’ve actually had a similar issue in the US with some Electronic Initial Case Report documents (to be clear, we mostly rely on Electronic Lab Results-ELR rather than eICR in the US to count COVID numbers, so this hasn’t resulted in an under-count)
Rather than be a problem with the database having arbitrary file size limits, it actually was a problem with file size limits within network intermediaries. In most cases, the issue is less the limit, and more that the files themselves were too large because of how they were created. I don’t think this is the same issue as the UK (I don’t think they are using the eICR standard there yet), but it’s an example of how you could have a problem with file size limitations without storing data in Excel.
- netsharc 4 years agoIt's probably more the web server configured to only allow x MB maximum for file uploads. Or since the government has been shown to be incompetent, maybe it's the maximum size for email attachments..
- 4 years ago
- bar8uyr 4 years agoOriginally I presumed this was an error importing Excel files and an upload file size limit..
- bArray 4 years agoA database running from a FAT disk partition for example.
- s_dev 4 years agoA normalised correctly structured SQL DB?
- zelos 4 years agoI'm not a backend developer, but isn't that likely to be in the terabytes? This seems to be issues around tens of thousands of rows.
- zelos 4 years ago
- josephpmay 4 years ago
- zelos 4 years agoSeems like other (slightly) more reputable sources are now reporting that it's Excel: https://www.standard.co.uk/news/uk/covid-testing-technical-i... A Telegraph article mentions a Press Association report.
- JonoBB 4 years agoApparently each case was recorded in a column, and the number of columns had reached the maximum (16,384).
- arethuza 4 years agoThe document referenced in that tweet doesn't seem to mention columns?
- arethuza 4 years ago
- riffraff 4 years agothank you for posting this, I could not understand why everybody kept referencing Excel while TFA didn't.
I thought people had somehow done some math on #cases/sheets to arrive at an obvious-but-not-to-me fact :)
- mellosouls 4 years ago
- slashdotdash 4 years ago“The issue occurred because some files containing positive test results exceeded the maximum file size that takes these data files and loads then into central systems, officials said.”
- mytailorisrich 4 years agoLatest news report suggest that this is Excel spreadsheets' size limit.
- spuz 4 years agoWhere did you see this?
- benaadams 4 years agohttps://twitter.com/MaxCRoser/status/1313046638915706880
> The reason was apparently that the database is managed in Excel and the number of columns had reached the maximum.
- benaadams 4 years ago
- spuz 4 years ago
- mytailorisrich 4 years ago
- tus88 4 years agoAre they using msdos or something?
- tonyedgecombe 4 years agoI don't think they are even using grey matter.
- tonyedgecombe 4 years ago