UK Covid-19 data underreported due to file size limit exceeded

134 points by slashdotdash 4 years ago | 22 comments
  • mellosouls 4 years ago
    Note: for all the people reckoning this is an Excel issue as reported on Twitter:

    a) The Twitter link is referencing the Daily Mail. https://www.dailymail.co.uk/news/article-8805697/Furious-bla...

    b) The Mail does not source it's claim.

    c) All those Excel references in the news seem to postdate (at time of writing) speculation on Excel in IT-heavy forums like this.

    While I can believe the Excel conjecture is correct, I wish people would stop referencing it like it's a proven fact, or provide an authoritative source for the Excel claim.

    I hate to be that guy, but...

    • mellosouls 4 years ago
      BBC say they have 'confirmed' it's Excel, though still a bit mysterious on how they've done that: https://www.bbc.co.uk/news/uk-54422505

      Particularly, they reckon they were using the older XLS rather than XLSX format with the lower 65000 row limit - although that seems more like a deflection from the fact functionality limits weren't properly handled, tested or designed for.

    • zelos 4 years ago
      I hope you're right, but most reports do at least mention "exceeding maximum file size". That seems like a fairly basic error: what reasonable (national-scale) data storage formats would have an arbitrary file size limit?
      • josephpmay 4 years ago
        We’ve actually had a similar issue in the US with some Electronic Initial Case Report documents (to be clear, we mostly rely on Electronic Lab Results-ELR rather than eICR in the US to count COVID numbers, so this hasn’t resulted in an under-count)

        Rather than be a problem with the database having arbitrary file size limits, it actually was a problem with file size limits within network intermediaries. In most cases, the issue is less the limit, and more that the files themselves were too large because of how they were created. I don’t think this is the same issue as the UK (I don’t think they are using the eICR standard there yet), but it’s an example of how you could have a problem with file size limitations without storing data in Excel.

        • netsharc 4 years ago
          It's probably more the web server configured to only allow x MB maximum for file uploads. Or since the government has been shown to be incompetent, maybe it's the maximum size for email attachments..
          • 4 years ago
            • bar8uyr 4 years ago
              Originally I presumed this was an error importing Excel files and an upload file size limit..
              • bArray 4 years ago
                A database running from a FAT disk partition for example.
                • s_dev 4 years ago
                  A normalised correctly structured SQL DB?
                  • zelos 4 years ago
                    I'm not a backend developer, but isn't that likely to be in the terabytes? This seems to be issues around tens of thousands of rows.
                • zelos 4 years ago
                  Seems like other (slightly) more reputable sources are now reporting that it's Excel: https://www.standard.co.uk/news/uk/covid-testing-technical-i... A Telegraph article mentions a Press Association report.
                  • JonoBB 4 years ago
                    Apparently each case was recorded in a column, and the number of columns had reached the maximum (16,384).

                    https://twitter.com/MaxCRoser/status/1313046638915706880

                    • arethuza 4 years ago
                      The document referenced in that tweet doesn't seem to mention columns?
                    • riffraff 4 years ago
                      thank you for posting this, I could not understand why everybody kept referencing Excel while TFA didn't.

                      I thought people had somehow done some math on #cases/sheets to arrive at an obvious-but-not-to-me fact :)

                    • slashdotdash 4 years ago
                      “The issue occurred because some files containing positive test results exceeded the maximum file size that takes these data files and loads then into central systems, officials said.”
                    • tus88 4 years ago
                      Are they using msdos or something?
                      • tonyedgecombe 4 years ago
                        I don't think they are even using grey matter.