A Guide to Undefined Behavior in C and C++ (2010)
79 points by GarethX 3 months ago | 64 comments- jasonthorsness 3 months agoThis is an area the newer languages get right - I don’t think Rust or Go has any undefined behavior? I wish they would have some kind of super strict mode for C or C++ where compilation fails unless you somehow fix things at the call sites to tell the compiler the behavior you want explicitly.
- zyedidia 3 months agoI think data races can cause undefined behavior in Go, which can cause memory safety to break down. See https://research.swtch.com/gorace for details.
- jjmarr 3 months ago> I wish they would have some kind of super strict mode for C or C++ where compilation fails unless you somehow fix things at the call sites to tell the compiler the behavior you want explicitly.
The C++ language committee _does not_ want to add more annotations to increase memory safety.
- anon-3988 3 months agoNot even annotations. The committee standardize this https://en.cppreference.com/w/cpp/container/span/operator_at
So they clearly doesn't care so there's no point convincing them.
- maxlybbert 3 months agoThe committee tends to also provide bounds checked interfaces ( https://en.cppreference.com/w/cpp/container/span/at ). But that requires people read the documentation, and based on the number of people who I see write "std::endl" when they really want "'\n'", I don't have much hope for that ("std::endl" both sends '\n' to the stream, and flushes it; people are often surprised about the stream getting flushed).
- maxlybbert 3 months ago
- tialaramex 3 months agoSeveral programming languages can testify to the fact than a Benevolent Dictator For Life is not a panacea. Several more than testify that having a Committee to design the language is likewise not a panacea. Perhaps uniquely C++ can clarify for us that both is in fact worse than either alone.
- pjmlp 3 months agoSame applies to C.
- pjmlp 3 months ago
- anon-3988 3 months ago
- steveklabnik 3 months agoSafe Rust has no undefined behavior. Unsafe Rust does.
- vlovich123 3 months agocough std::env::set_var cough :D.
- whytevuhuni 3 months agostd::env::set_var [1] has already been changed to unsafe in the 2024 edition of the compiler [2].
So yeah, such things exist, but what's important is what the compiler devs choose to do once such issues are found. The C++ compiler devs say "That's an unfortunate case that cannot be fixed." The Rust devs say "That's a bug, here's the issue link."
[1] https://doc.rust-lang.org/std/env/fn.set_var.html
[2] https://doc.rust-lang.org/edition-guide/rust-2024/newly-unsa...
- ultimaweapon 3 months agoTo be fair the UB caused by this function come from the underlying C implementation and this function already marked as unsafe on 2024 edition.
- whytevuhuni 3 months ago
- vlovich123 3 months ago
- pjmlp 3 months agoOlder languages as well, those that weren't a copy-paste from C with extras.
Modula-2, Ada, Object Pascal, Eiffel, Delphi,...
- almostgotcaught 3 months agoyou realize UB is basically an escape hatch from the standard for compilers right? it's not like a flaw in the language, it's gaps negotiated by the standards committee (well for the most part i guess). so the reason new languages don't have UB is because new languages don't have multiple implementations (go definitely doesn't, does anyone use rust-gcc?).
- Dylan16807 3 months agoYou only need implementation-defined behavior to grease the wheels of multiple implementations. You don't need the gaping void of undefined for that use.
There's a big difference between "it'll be some number, not promising which one" and "the program loses definition and anything can break, often even retroactively".
- almostgotcaught 3 months agoPotato potato. My point is UB isn't an accident it's intentional. Mind you I'm not saying it's great, just that it's not some kind of slipup.
- almostgotcaught 3 months ago
- wolvesechoes 3 months ago"go definitely doesn't"
What? gccgo, TinyGo, and GopherJS.
- porridgeraisin 3 months agoMore prominently, microsoft go
- porridgeraisin 3 months ago
- Dylan16807 3 months ago
- zyedidia 3 months ago
- hoseja 3 months agoDon't let people performatively horrified of undefined behaviour know about Gödel.
- zombot 3 months agoThe most horrifying aspect of UB is that it can affect your program without the instructions triggering it ever being executed. And many greenhorns don't know that or even believe it to be false. So the effects of Dunning-Kruger may be more severe in C(++) than in other languages.
- maxlybbert 3 months agoIt’s not just “there’s a line in some file that’s undefined” partly because undefined behavior is often caused by the state of the world. Dereferencing a pointer is defined unless the pointer is invalid, and any particular pointer may be valid sometimes and invalid others.
But since a compiler can do anything when a program is ill-defined, if a line of code could be well-defined in some cases and ill-defined in others, a compiler is allowed to only handle the well-defined cases, knowing that it will do the “wrong” thing when something about the code is undefined (because in that case, there is no “right” or “wrong” behavior.
This does lead to weird things:
The compiler can delete the “if” statement because the potential undefined behavior happens before the check. Either the pointer is valid when dereferenced (and the “if” statement gets skipped), or the pointer is invalid, and skipping the “if” statement is acceptable for “the compiler can do anything, even things that can’t be expressed in the language.” But it only does the weird things in cases where an invalid pointer would have been dereferenced.auto val = *ptr; if (!ptr) { . . . }
- maxlybbert 3 months ago
- guimplen 3 months agoThe first example (signed integer overflow) is no longer valid in newer standards of C. Now it should use the two-complement semantics and no UB.
- Rusky 3 months agoI believe they only standardized the two's-complement representation (so casts to unsigned have a more specific behavior, for example) but they did not make overflow defined.
- LegionMammal978 3 months agoYeah, signed integer overflow is as UB as ever. I've heard the primary reason for it is to avoid the possibility of wraparound on 'for (int i = 0; i < length; i++)' loops where the 'length' is bigger than an int. (Of course, the more straightforward option would be to use proper types like size_t for all your indices, but it's a classic tradition to use nothing but char and int, and people judge compilers based on existing code.)
- ForTheKidz 3 months agoptrdiff_t is also useful in this case if signed semantics are desired.
- vlovich123 3 months ago> I've heard the primary reason for it is to avoid the possibility of wraparound on
Making it UB doesn’t fix that in any way that I can think of.
- ForTheKidz 3 months ago
- LegionMammal978 3 months ago
- pajko 3 months agoUB can be converted to ID by using -fwrapv (to "standardize" the wrapround, which does not necessarily help if the overflow was not intentional) or -ftrapv (generate an exception).
- Rusky 3 months ago
- fsckboy 3 months agomy opinion as a very experienced C system programmer:
there must be better sources to guide people than a poorly written and infantilizing article from 15 years ago.
- jcranmer 3 months agoMy experience is that self-described "very experienced C system programmers" are simultaneously the people who are most in need of a good explainer on undefined behavior and the most likely ones to throw a conniption fit halfway through and stop reading, for the hallmark of a good explainer on UB is that it will explain that a) it exists for a reason; b) no, just doing a "little" UB isn't safe; and c) it's not the compiler's fault that things go awry when you do UB, it's the programmer's fault.
One of the blog posts I've long queued up for writing is "In defense of undefined behavior." It's only half-written, though, but the gist is justifying UB by pointing out that you can't optimize C code with it (via an example using pointer provenance), then pointing out why uninitialized values look weirder than you think by reference to the effects of system libraries, and then I would actually walk through why specification authors should reach for undefined behavior in various places.
- raphlinus 3 months agoOh hey, I also have "in defense of undefined behavior" in the queue of blog posts I'd like to write some time, with that exact title. What a coincidence. That said, it's unlikely to get written as I have things that are more specific to my actual research ahead of it.
One of the things I'd want to say is that UB is a useful and accurate way to model what happens when, say, a program writes over memory used by the allocator. Languages like Odin might try to pretend they don't have UB, but in my opinion it's impossible to get there just by disabling certain compiler optimizations (see https://news.ycombinator.com/item?id=32800814 for an argument about this).
I see UB as essentially a proof obligation, to be discharged in some other way. A really good way is to have UB in the intermediate representation, and compile a safe language into it (with unsafe escape hatches when needed). But there are other ways, including formal methods, rigorous testing, or just being a really smart solo programmer who's learned how to avoid UB and doesn't have to work in a team.
Feel free to send me your draft.
- vlovich123 3 months agoRaph, I think you may be using a different definition of UB than what compiler authors are using? As I understand it in the language sense of the word, UB technically allows the compiler to interpret the code however it wants. To me utility in UB are relying on some kind of well-defined behavior to result which would imply that you are either just relying on today’s behavior OR you are doing something that’s non-deterministic but not violating language rules? Or some intermediate definition where it’s both violating language rules but no future version of the compiler is likely to be able to detect the UB and change behavior?
UB is very useful for compiler authors because they can apply very useful optimizations with “illegal” code and then emit illegal code constructs when they want those optimizations to apply. I have a hard time understanding how that’s useful to language users though.
- pjmlp 3 months agoUsing protective gear, or making cars safer for crashes, also slows down physics versus not using them at all, yet lifes are saved every year where people would otherwise die or be crippled.
As someone that rather prefers Wirth culture on programming languages, UB at the expense of safety isn't a clear win, that is why we end up with security exploits or hardware mitigations for UB based optmizations gone too far.
- vlovich123 3 months ago
- fsckboy 3 months ago>most likely ones to throw a conniption fit halfway through and stop reading
listen to you, you didn't read the article it's clear, because the article doesn't agree with the rest of what you said. So, since you're not defending an article, you're just attacking me, which is pretty schmucky
- jcranmer 3 months agoI did read the article. I have my disagreements with John Regehr on UB, though, as he's definitely in the camp of "let's try to specify it out entirely from the compiler," which doesn't entirely work.
- jcranmer 3 months ago
- raphlinus 3 months ago
- staunton 3 months agoBeing a very experienced programmer, I'm sure you know many such sources. Can you share any?
- camel-cdr 3 months agoThe C standard Annex J has a list of undefined behavior: https://port70.net/~nsz/c/c99/n1256.pre.html#J
- AlotOfReading 3 months agoAnnex J is a list of explicit undefined behavior. It can't and doesn't attempt to enumerate the vastly larger universe of implicit undefined behavior.
There's also no official list for C++, just a proposal to make one that's languished in committee for the past 6ish years.
- AlotOfReading 3 months ago
- camel-cdr 3 months ago
- rberg 3 months agoAgreed. Running with a basketball is very much possible, I'm unsure as to why John thinks otherwise.
- ultrarunner 3 months agoPerhaps you could draw on your wealth of experience to write one. I’d love to read it!
- imtringued 3 months agoIn my opinion it's not infantilizing enough. If you are a C developer and have never heard of model checking, then you are grossly incompetent and should never be allowed near a computer.
- pjmlp 3 months agoIf only folks would write code in a way that infantilizing article from 15 years ago aren't as actual as ever.
- jcranmer 3 months ago
- mwkaufma 3 months agoDespite the high frequency that alarmist "formatting you HDD" is cited in discussing UB, I've never seen it happen. Surely there exist real examples of catastrophic failures which could actually teach us something, beyond making a hyperbolic point.
- vlovich123 3 months agoIt was intentionally hyperbolic tongue in cheek and understood to be as such. The reason is to fight through the discounting people (at least at the time) had that UB was just a segfault or something. Here’s a kernel exploit that was a result of UB [1]. It’s not hard to imagine that hypothetically UB in the kernel could result in “just so” corruption that would call the “format your HDD routine” even if in practice it’s extremely unlikely (& forensically it would basically be impossible to prove that it was UB that caused it).
- AlotOfReading 3 months agoOne example is control flow bending [0], which uses carefully crafted undefined behavior to "bend" the CFG into arbitrary, turing complete shapes despite CFI protections. The author abused this to implement tic tac toe in a single call to printf for a prior obfuscated C contest [1].
Of course, that misses the real point that "formatting your HDD" is simply an allowed possibility rather than a factual statement on the consequences.
[0] https://www.usenix.org/conference/usenixsecurity15/technical...
- vlovich123 3 months ago