UK air traffic control outage caused by bad data in flight plan
24 points by orobinson 1 year ago | 20 comments- mytailorisrich 1 year ago"Nats said the failure was due to “an extremely rare set of circumstances” with two identically named but separate waypoint markers outsidethe UK’s airspace "
Sounds like a fairly common error case to check for although they say that it never happened before.
ID collisions are always something to check for, not least when the data are user inputs.
- bell-cot 1 year agoAnd checking for ID collisions is generally extremely easy, both in code and compute.
And the ID collision was in user data which the air traffic system has to continuously accept during operations.
And it sounds like that data breaks down into individual flight plans - so it might be trivial to reject just one flight plan, and allow the rest to proceed.
BUT...doubtless the UK's flight control software came out of some multi-billion-pound government boondoggle. So we should be grateful that it doesn't crash planes into each other, or send innocent postal workers to jail for theft, and overlook these sorts of failures.
- cameronh90 1 year agoMost bugs are pretty easy to solve once you know they're there.
It's all well and good to say that software just shouldn't have bugs, but that's pretty much an unsolved problem at this point. The NATS system has a relatively good track record, and even companies with exemplary engineering standards have occasionally had large system failures.
Let he who is without sin cast the first stone.
- matthewmacleod 1 year agoI have no doubt that there are huge numbers of issues with many large-scale IT projects, but this sort of cynical and hyperbolic armchair analysis makes it even harder to have rational conversations that help prevent systems failures in the future.
Consider reading the actual initial Nats report https://publicapps.caa.co.uk/docs/33/NERL%20Major%20Incident... – this provides a bunch of interesting analysis and technical information.
I'm sorry for being mean about it, but it's a personal bugbear of mine when complex systems failures are boiled down to lazy analysis.
- robjampar 1 year ago"reject just one flight plan, and allow the rest to proceed."
rejecting a plan wouldn't necessarily mean it doesn't exist/take off anymore, so that doesn't sound sensible
- bell-cot 1 year agoFlight delays / cancellations / diversions (due to mechanical problems, weather, etc.) are a very regular thing - the airlines, ground crews, commercial pilots, and control towers have lots of experience with "Flight 1234 won't be taking off..." and "Flight 2345 is being diverted to...".
Or, if it's a "Bob owns a Cessna, and took off anyway" situation - well, Bob's license to fly a private airplane will probably be taken away. Maybe his Cessna, too. And (post-9/11) Bob could be spending some time in uncomfy little rooms with bars on the windows.
- mytailorisrich 1 year agoI admit I have no idea how the system works but if there is an obligation to submit a flight plan in advance then there should also be a standard procedure not to let planes take off or enter airspace if they don't. At the very least there should indeed be a procedure to reject the flight plan even if flight cannot be stopped.
- bell-cot 1 year ago
- Scoring6931 1 year agoWith regards to postal workers: https://www.bbc.com/news/business-56718036
- cameronh90 1 year ago
- darkclouds 1 year agoIt may well have happened before, but things like updates to new versions of a language, use of updated libraries, and other things which typically creep into new versions of programming languages to minimise bugs inevitably introduce bugs because they dont have a unit/component test to flag these things up. It also suggests different programmers have worked on the system.
- 1 year ago
- bell-cot 1 year ago
- nickdothutton 1 year agoHaving read the actual report... insufficiently rigorous validation of inputs leads to discovery of corner case.
They could have probably found this sooner with either fuzzing or perhaps some sort of digital twin model.
Finally, there's no exit clause to reject a flight plan from an "upstream"? That is a worry.
- zooFox 1 year agoIf there's two waypoint markers that are named the same, how did the flight control and/or plane software know which one is being referred? Assuming closest, it would have had to special case for it already, no?
e.g. if I want to drive to Springfield, it needs to know which one out of 67 I'd like to go to...
- abenbow 1 year agoLink to the report from NATS (PDF) https://publicapps.caa.co.uk/docs/33/NERL%20Major%20Incident...
- speg 1 year agoMy current project involves processing flight plans. I believe the company even helped build part of NATS. There must be something else going on to crash the whole system.
We get so many invalid flight plans from third parties (e.g., ForeFlight) that the system would never be up if we didn’t mark them as invalid and move on to the next.
- ChrisArchitect 1 year agoFeels like a [dupe]
- ChrisArchitect 1 year agoNews from a week ago mostly, with a number of posts:
https://news.ycombinator.com/item?id=37320322
https://news.ycombinator.com/item?id=37328377
- ChrisArchitect 1 year agoMore earlier discussion over here: https://news.ycombinator.com/item?id=37401864
- ChrisArchitect 1 year ago