Silkenweb Example: Hackernews Clone

CVE-2024-6409: OpenSSH: Possible remote code execution in privsep child

141 points by andreyv 11 months ago | 56 comments

stncls 11 months ago
No vulnerability name, no website, concise description, neutral tone, precise list of affected distros (RHEL + derivatives and some EOL Fedoras) and even mention of unaffected distros (current Fedoras), plain admission that no attempt was made to exploit. What a breath of fresh air!
(I am only joking of course. As a recovering academic, I understand that researchers need recognition, and I have no right to throw stones -- glass houses and all. Also, this one is really like regreSSHion's little sibling. Still, easily finding the information I needed made me happy.)
- tptacek 11 months ago
  I don't think recognition for researchers is the big win for named vulnerabilities. In the places that matter, they can just describe their findings in a short sentence and get all the recognition that matters. The names are mostly for the benefit of users.
  - ericpauley 11 months ago
    Security researchers definitely do the naming gimmick for personal brand purposes. This may not be as obvious when it’s successful, but academic papers routinely name vulnerabilities when there is no real benefit to users.
    - tptacek 11 months ago
      The whole point of naming vulnerabilities is to establish a vernacular about them, so it's not surprising that academic papers name them. The literature about hardware microarchitectural attacks, for instance, would be fucking inscrutable (even more than it is now) without the names.
- AndyMcConachie 11 months ago
  The author of the mail is Solar Designer, a bit of a legend AFAIC. He has no need to pump up his brand and he really really knows what he's doing.
  - formerly_proven 11 months ago
    Yeah. He created openwall and the oss-security list.
- dimask 11 months ago
  At least they do not name them after themselves.

hannob 11 months ago

For clarification, the bug is in a patch applied by red hat, not in openssh itself.

bonzini 11 months ago

Technically the bug is in upstream code, but it is latent without the Red Hat patch:

> cleanup_exit() was not meant to be called from a signal handler [...] Fedora 38+ has moved to newer upstream OpenSSH that doesn't have the problematic cleanup_exit() call.

> This extra problematic logic only existed in upstream OpenSSH(-portable) for ~9 months

The fix also doesn't touch the Red Hat-specific code:

     diff -urp openssh-8.7p1-38.el9_4.1-tree.orig/sshd.c openssh-8.7p1-38.el9_4.1-tree/sshd.c
     --- openssh-8.7p1-38.el9_4.1-tree.orig/sshd.c 2024-07-08 03:42:51.431994307 +0200
     +++ openssh-8.7p1-38.el9_4.1-tree/sshd.c 2024-07-08 03:48:13.860316451 +0200
     @@ -384,7 +384,7 @@ grace_alarm_handler(int sig)
      
       /* Log error and exit. */
       if (use_privsep && pmonitor != NULL && pmonitor->m_pid <= 0)
     -  cleanup_exit(255); /* don't log in privsep child */
     +  _exit(1); /* don't log in privsep child */
       else {
        sigdie("Timeout before authentication for %s port %d",
            ssh_remote_ipaddr(the_active_state),

They suggest applying it even on non Red Hat distros.

loeg 11 months ago
Sort of. The upstream bug isn't thought to be exploitable alone.

londons_explore 11 months ago
Couldn't this entire class of bug be solved by annotating signal handlers in the source code and checking at compile time that anything called from a signal handler is async-signal-safe?
- dgrunwald 11 months ago
  There are some static analysis tools that can check this.
  Cert's SIG30 rule page has a list: https://wiki.sei.cmu.edu/confluence/display/c/SIG30-C.+Call+...
  Also there's https://clang.llvm.org/extra/clang-tidy/checks/bugprone/sign...
- klysm 11 months ago
  Sounds reasonable, but since the language layer has no knowledge of signal handlers or what that means, it would be a separation of concerns problem. I'm sure you could get clang to do it, but still a tricky thing to design around.
  Ultimately it's an example of an invariant where it's clear that programmers can't be trusted to uphold it. In this case, the consequences can be very significant.
  - Joker_vD 11 months ago
    > the language layer has no knowledge of signal handlers or what that means
    Despite the fact that there is explicit runtime support for signal handlers in the language runtime (i.e. libc).
    - ori_b 11 months ago
      Libc isn't the language runtime. The runtime is '/use/lib/crt*.o', which has no concept at all of signal handling.
      Libc isn't particularly intrinsic to the language, and outside of some assembly to make syscalls, you can implement an alternative with a completely different interface, purely in C.
    - sqeaky 11 months ago
      Yep, such is C.
  - 11 months ago
- loeg 11 months ago
  Static analysis tools would go a long way here, yes, and it should be a relatively straightforward analysis. You probably don't even need to explicitly annotate signal handlers, just examine arguments to calls to signal() and sigaction().
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [dead]
  - nuijakas 11 months ago
    [flagged]
  - nuijakas 11 months ago
    [flagged]
  - nuijakas 11 months ago
    [flagged]
  - nuijakas 11 months ago
    [flagged]
- immibis 11 months ago
  This entire class of bug could also be solved by avoiding signal handlers. You can still use SIGALRM for a timeout, but don't log it. If you need complex processing, use signalfd to read signals in the event loop.
- bregma 11 months ago
  Well, the compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading.
  There is no portable way to annotate all functions ever written or ever will be written as being async signal safe.
  Which functions are async signal safe varies with the operating system and runtime (eg. an unsafe function in linux-gnu might be safe in linux-musl or linux-bionic).
  Other than those insurmountable problems, yeah, good idea.
  - fanf2 11 months ago
    What I want when writing a signal handler is to be able to say, this function must be async-signal-safe and therefore all the functions it calls must be async-signal-safe. That can be done purely at compile time; I don’t need to worry about linking.
    The annotation does not need to be portable; if it’s present on one system then other systems still benefit because the code is written to pass the check.
    The list of async-signal-safe functions is well documented and quite short, so it would not be much work to add the annotations to the header files. It’s OK if some safe functions are omitted, because signal handlers should be written to do the absolute bare minimum.
    - bregma 11 months ago
      No, any function can call another function from another translation unit (at link time) or load and call a function from another translation unit (at runtime). How will the compiler enforce the propagation of the requirement in those cases?
  - adrianmonk 11 months ago
    > compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading
    You could check it at runtime.
    Just like with array bounds checking, in many cases the compiler could sometimes prove the runtime check isn't necessary and eliminate it.
    > Which functions are async signal safe varies with the operating system and runtime
    Annotations could enumerate specific platforms where it is safe or unsafe. Or you could annotate based on specific attributes of platforms that make it safe or unsafe.
  - account42 11 months ago
    > Well, the compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading.
    That's why GP suggested annotating them. Typically this would be done via a function attribute.
    > There is no portable way to annotate all functions ever written or ever will be written as being async signal safe.
    This is not a requirement for such an annotation to exist and to be used by projects that care about security or even just correctness.
    > Which functions are async signal safe varies with the operating system and runtime (eg. an unsafe function in linux-gnu might be safe in linux-musl or linux-bionic).
    And libc implementations already annotate many of their functions to tell the compiler how they work. Compilers are also more than happy to assume behavior of standard function matches the C/C++ standards in non-freestanding environmnets.
    > Other than those insurmountable problems, yeah, good idea.
    All fairly trivial problems that have already been solved many times for similar issues.
    I'd like a more general attribute though to declare that a particular funcion is in some abstract domain and then annotations that certain functions may or may not be called in certain domains. This could come useful in cases where you want some functions to only be called from special threads.
    - bregma 11 months ago
      > That's why GP suggested annotating them. Typically this would be done via a function attribute.
      That won't help when you link external functions or worse, dynamically load them. Those are things done long after the compiler has run.
      > And libc implementations already annotate many of their functions to tell the compiler how they work. Compilers are also more than happy to assume behavior of standard function matches the C/C++ standards in non-freestanding environmnets.
      We're not talking about standard functions here, we're talking about any function any developer could ever call in a signal context. Ever. Like, for example, a libssh shutdown function that invokes a callback that calls a syslog function that does some socket operation on a buffer that some other thread has already freed. Which of those functions needs the annotation, and how does dlsym() deal with it?
qalmakka 11 months ago
This is why I've always disliked Debian and Red Hat.
1. I hate the fact they have the hubris to think they can be smarter than the upstream developers and patch old versions
2. I hate the fact they don't ship vanilla packages, but instead insist on patching things for features that nobody relies on anyway, __because they're not upstream__.
Maintainers should stick to downloading tarballs, building them and updating them promptly when a new version is out. If there's no LTS available, pay upstream and get an LTS, don't take a random version and patch it forever just to keep the same version numbers, it's nonsensical and it was only a matter of time before people tried to exploit it. Just look at the XZ backdoor for instance, which relied on RedHat and Debian deploying a patched libsystemd.
- gruturo 11 months ago
  Enterprises don't go for RHEL because it's free software, yay freedom!
  They go for it because it gives a very stable, solid foundation. They don't want a fragile base layer prone to breaking every day of the week.
  This involves backporting a lot of stuff (primarily security fixes) because you can't just upgrade any package to its latest version, it will have entirely new dependencies, potentially breaking changes etc.
  What should RedHat do, which does not:
  1) make them lose their enterprise customers wanting a stable base
  2) have unpatched security holes all over their distros
  3) not cause them to backport stuff (we are here at the moment) ?
  - JackSlateur 11 months ago
    Compagnies I've worked for that uses redhat do so because they think paying will prevent them from working. As if, sudenly, running a nearly 10y-old code was no longer stupid, because you paid for it.
    - fusslo 11 months ago
      > As if, suddenly, running a nearly 10y-old code was no longer stupid, because you paid for it.
      I love this
    - bonzini 11 months ago
      Of course it is stupid, the question is whether you have an alternative that isn't stupid and respects all the regulation/certification requirements.
    - worthless-trash 11 months ago
      I work in the kernel maintenance group, the "10 year old code" is having select important and critical flaws applied.
      If you think about it, all maintained projects of old code has this same mechanism, just has more frequent updates.
      How does the age metric now work once you know this ?
  - qalmakka 11 months ago
    I understand the business logic behind that. The point is, maybe they should consider paying the upstream developers to backport the stuff themselves instead of dabbling with C code they somewhat understand?
    - sqeaky 11 months ago
      C isn't magic, plenty of people understand it and lots of these projects move quite slow. That these things CVEs on ssh are so rare shows how well this process normally works. These past couple of weeks have had 3(?) ssh vulnerabilities? We often go years with one, and not all are a result of packaging some come from upstream.
      Any new process needs to not just fix this problem, but also all or at least most of the problems that the existing processes fixes.
    - gruturo 11 months ago
      Tracing thousands of developers across the planet and drafting contracts in hundreds of jurisdictions, including some where you don't even have a branch office or any kind of legal presence? Ugh. And what if one's from an embargoed country? What if a primadonna asks for a million a day, or half of them don't deliver in time, go on holiday, win the lottery, fall in love, get into a dispute, refuse to work with you, don't have access to all the architectures for comprehensive testing, lose interest, change employer (and can't work with you anymore), or sell to some dodgy entity preparing the next sBoM attack.
      ....better to pay your own people. Hire them if they're available, sure, otherwise task an engineer with this.
- rlpb 11 months ago
  Distributions patch to get consistent and integrated behaviour across the platform, including new features that require implementation into multiple places to be even minimally useful, and ports to newer APIs to make everything work against a single version of each library so that they do not conflict.
  Upstreams generally don't do this until prompted, and sometimes resist until the path is proven and becomes best practice because distributions pushed it.
  This process is mostly invisible to you because distributions have been successful at getting the changes needed for sane and consistent behaviour embedded into tools and expectations by default. All you see are the patches that are in flight or didn't make it.
  If you want a distribution that does only minimal patching then there are distributions for that. The fact that the major distributions do patch speaks volumes about which approach results in a better experience for users.
- Diederich 11 months ago
  Do you think there is a reason that some distros go through all of this additional trouble?
candiddevmike 11 months ago
The risk you take when you use a distribution that modifies upstream. Debian has had similar issues in the past (maybe not CVEs, but certainly packager-created bugs).
- 2OEH8eoCRo0 11 months ago
  It's risks all the way down. There are risks to not patching upstream as well.
- klysm 11 months ago
  Debian has a fairly famous one: CVE-2008-0166
  - meowface 11 months ago
    Ouch, that one's bad: https://github.com/g0tmi1k/debian-ssh#the-bug
    >These lines were removed because they caused the Valgrind and Purify tools to produce warnings about the use of uninitialized data in any code that was linked to OpenSSL. Removing this code has the side effect of crippling the seeding process for the OpenSSL PRNG. Instead of mixing in random data for the initial seed, the only "random" value that was used was the current process ID. On the Linux platform, the default maximum process ID is 32,768, resulting in a very small number of seed values being used for all PRNG operations.
  - rlpb 11 months ago
    In that particular case upstream _was_ consulted and had acked the patch.
    - immibis 11 months ago
      Upstream was consulted for a similar change in another location, where the code was actually unnecessary.
ta988 11 months ago
My understanding here is that it only impacts Redhat (and maybe derivatives)?
- stncls 11 months ago
  Yes, only RHEL 9 (the current version of RHEL) and its upstreams/downstreams (CentOS Stream 9, Rocky Linux 9, Alma Linux 9,...).
  Also affected: Fedora 37, 36 and possibly 35, which are all end-of-life (since December 2023 in the case of Fedora 37).
  Not affected: Fedora 38 (also EOL), 39 (maintained) and 40 (current).
password4321 11 months ago
Is this in any way related to CVE-2024-6387 "RegreSSHion" discussed last week?
https://news.ycombinator.com/item?id=40843778
Edit: Ok it seems very closely related; I was just surprised no one had linked the previous discussion.
abdil07 11 months ago
[flagged]
crest 11 months ago
It's almost as if you should understand security critical C code before you start patching it to death.
- 11 months ago