Curl HTTP/3 Performance

157 points by BitPirate 1 year ago | 118 comments
  • hlandau 1 year ago
    Author of the OpenSSL QUIC stack here. Great writeup.

    TBQH, I'm actually really pleased with these performance figures - we haven't had time yet to do this kind of profiling or make any optimisations. So what we're seeing here is the performance prior to any kind of concerted measurement or optimisation effort on our part. In that context I'm actually very pleasantly surprised at how close things are to existing, more mature implementations in some of these benchmarks. Of course there's now plenty of tuning and optimisation work to be done to close this gap.

    • apitman 1 year ago
      I'm curious if you've architected it in such a way that it lends itself to optimization in the future? I'd love to hear more about how these sorts of things are planned, especially in large C projects.
      • hlandau 1 year ago
        As much as possible, yes.

        With something like QUIC "optimisation" breaks down into two areas: performance tuning in terms of algorithms, and tuning for throughput or latency in terms of how the protocol is used.

        The first part is actually not the major issue, at least in our design everything is pretty efficient and designed to avoid unnecessary copying. Most of the optimisation I'm talking about above is not about things like CPU usage but things like tuning loss detection, congestion control and how to schedule different types of data into different packets. In other words, a question of tuning to make more optimal decisions in terms of how to use the network, as opposed to reducing the execution time of some algorithm. These aren't QUIC specific issues but largely intrinsic to the process of developing a transport protocol implementation.

        It is true that QUIC is intrinsically less efficient than say, TCP+TLS in terms of CPU load. There are various reasons for this, but one is that QUIC performs encryption per packet, whereas TLS performs encryption per TLS record, where one record can be larger than one packet (which is limited by the MTU). I believe there's some discussion ongoing on possible ways to improve on this.

        There are also features which can be added to enhance performance, like UDP GSO, or extensions like the currently in development ACK frequency proposal.

        • Matthias247 1 year ago
          Actually the benchmarks just measure the first part (cpu efficiency) since it’s a localhost benchmark. The gap will be most likely due to missing GSO if it’s not implemented. Its such a huge difference, and pretty much the only thing which can prevent QUIC from being totally inefficient.
          • apitman 1 year ago
            Thank you for the details!
        • benreesman 1 year ago
          Thank you kindly for your work. These protocols are critically important and more the more high-quality and open implementations exist the more likely they are to be free and inclusive.

          Also, hat tip for such competitive performance on an untuned implementation.

          • spullara 1 year ago
            Are there good reasons to use HTTP3/QUIC that aren't based on performance?
            • zamadatix 1 year ago
              I suppose that depends on your definitions of "good" and what counts as being "based on performance". For instance QUIC and HTTP/3 support better reliability via things like FEC and connection migration. You can resume a session on a different network (think going from Wi-Fi to cellular or similar) instead of recreating the session and FEC can make the delivery of messages more reliable. At the same time you could argue both of these ultimately just impact performance depending on how you choose to measure them.

              Something more agreeably not performance based is the security is better. E.g. more of the conversation is enclosed in encryption at the protocol layer. Whether that's a good reason depends on who you ask though.

              • Matthias247 1 year ago
                We need to distinguish between performance (throughput over a congested/lossy connection) and efficiency (cpu and memory usage). Quic can achieve higher performance, but will always be less efficient. The linked benchmark actually just measures efficiency since it’s about sending data over loopback on the same host
                • jzwinck 1 year ago
                  What makes QUIC less efficient in CPU and memory usage?
                • o11c 1 year ago
                  TCP has at least one unfixable security exploit: literally anybody on the network can reset your connection. Availability is 1/3 of security, remember.
                • foofie 1 year ago
                  How awesome is that? Thank you for all your harf work. It's thanks to people such as yourself that the whole world keeps on working.

                  Obligatory:

                  https://xkcd.com/2347/

                • vitus 1 year ago
                  It is promising to see that openssl-quic serial throughput is within 10-20% of more mature implementations such as quiche. (Which quiche, though? Is this Google's quiche, written in C++, or Cloudflare's quiche, written in Rust? It turns out that's approximately the only word that starts with "quic" that isn't a derivative of "quick".)

                  One of QUIC's weaknesses is that it's known to be much less CPU efficient, largely due to the lack of things like HW offload for TLS.

                  > Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server.

                  To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests. You'll see connection pooling of, uh, 6 (at least for Chrome and Firefox), so the problems of head-of-line blocking that HTTP/2 and HTTP/3 attempt to solve would have manifested in more realistic benchmarks.

                  Some questions I have:

                  - What kind of CPU is in use? How much actual hw parallelism do you have in practice?

                  - Are these requests actually going over the network (even a LAN)? What's the MTU?

                  - How many trials went into each of these graphs? What are the error bars on these?

                  • jsty 1 year ago
                    • pclmulqdq 1 year ago
                      Hardware offload should be protocol-independent, but I suppose most network cards assume some stuff about TLS and aren't set up for QUIC?
                      • Matthias247 1 year ago
                        NICs assume stuff for TCP (segmentation offload) that they can’t do for UDP, or can only do in a very limited fashion (GSO).

                        TLS offloads are very niche. There’s barely anyone using them in production, and the benchmarks are very likely without

                      • ndriscoll 1 year ago
                        > To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests.

                        They will. You just need to go bump that number in the settings. :-)

                        • secondcoming 1 year ago
                          Browsers aren't the only things that connect to servers that speak HTTP.
                        • BitPirate 1 year ago
                          The performance difference between H1/H2 and H3 in this test doesn't really surprise me. The obvious part is the highly optimised TCP stack. But I fear that the benchmark setup itself might be a bit flawed.

                          The biggest factor is the caddy version used for the benchmark. The quic-go library in caddy v2.6.2 lacks GSO support, which is crucial to avoid high syscall overhead.

                          The quic-go version in caddy v2.6.2 also doesn't adjust UDP buffer sizes.

                          The other thing that's not clear from the blog post is the network path used. Running the benchmark over loopback only would give TCP-based protocols an advantage if the QUIC library doesn't support MTU discovery.

                          • Etheryte 1 year ago
                            I don't think taking shots at the Caddy version being not the latest is a fair criticism to be honest. Version 2.6.2 was released roughly three months ago, so it's not like we're talking about anything severely outdated, most servers you run into in the wild will be running something older than that.
                            • zamadatix 1 year ago
                              I think you mixed up what year we're now :). Caddy 2.6.2 October 13, 2022 so it's been not 3 but 15 months since release.

                              Even more relevantly, HTTP/3 was first supported out of the box in 2.6.0 - released Sep 20, 2022. Even if 2.6.2 had been just 3 months old that it's from the first 22 days of having HTTP/3 support out of the box instead of the versions from the following 3 months would definitely be relevant criticism to note.

                              https://github.com/caddyserver/caddy/releases?page=2

                              • francislavoie 1 year ago
                                This is why I'm not a fan of debian. (I assume OP got that version from debian because I can't think of any other reason they wouldn't have used latest.) They packaged Caddy, but they never update at the rate we would reasonably expect. So users who don't pay attention to the version number have a significantly worse product than is currently available.

                                We have our own apt repo which always has the latest version: https://caddyserver.com/docs/install#debian-ubuntu-raspbian

                                • Etheryte 1 year ago
                                  Oh, right you are, somehow I completely mixed that up. Thanks for clarifying.
                            • nezirus 1 year ago
                              Maybe shout out to HAProxy people, like many they've observed performance problems with OpenSSL 3.x series. But having good old OpenSSL with QUIC would be so convenient for distro packages etc

                              https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-S...

                              • samueloph 1 year ago
                                Nice write-up.

                                I'm one of the Debian maintainers of curl and we are close to enabling http3 on the gnutls libcurl we ship.

                                We have also started discussing the plan for enabling http3 on the curl CLI in time for the next stable release.

                                Right now the only option is to switch the CLI to use the gnutls libcurl, but looks like it might be possible to stay with openssl, depending on when non-experimental support lands and how good openssl's implementation is.

                              • londons_explore 1 year ago
                                Anyone else disappointed that the figures for localhost are in MB/s not GB/s?

                                The whole lot just seems an order of magnitude slower than I was hoping to see.

                                • zamadatix 1 year ago
                                  A core of the 4770 (curl is single threaded) can't even manage a full order of magnitude more plain AES encryption throughput - ignoring it also has to be done into small packets and decrypted on the same machine.
                                • mgaunard 1 year ago
                                  HTTP/1 remains the one with the highest bandwidth.

                                  No surprise here.

                                  • 1vuio0pswjnm7 1 year ago
                                    It comes from a time before websites sucked because they are overloaded with ads ads and tracking.

                                    For non-interactively retrieving a single page of HTML, or some other resource, such as a video, or retrieving 100 pages, or 100 videos, in a single TCP connection, without any ads or tracking, HTTP/3 is overkill. It's more complex and it's slower than HTTP/1.0 and HTTP/1.1.

                                    • sylware 1 year ago
                                      I have a domestic web server, I did implement its code, and the most important was HTTP1.[01] to be very simple to implement, that to lower the cost of implementing my real-life HTTP1.[01] alternative (we all know Big Tech does not like that...).

                                      The best would be to have something like SMTP: the core is extremely simple and yet real-life-works everywhere, and via announced options/extensions it can _optionnaly_ grow in complexity.

                                    • BitPirate 1 year ago
                                      It's a bit like drag racing. If all you care about is the performance of a single transfer that doesn't have to deal with packet loss etc, HTTP/1 will win.
                                      • vbezhenar 1 year ago
                                        Yesterday I was trying to track weird bug. I moved a website to Kubernetes and its performance was absolutely terrible. It was loading for 3 seconds on old infra and now it spends consistently 12 seconds loading.

                                        Google Chrome shows that 6 requests require 2-3 seconds to complete simultaneously. 3 of those requests are tiny static files served by nginx, 3 of those requests are very simple DB queries. Each request completes in few milliseconds using curl, but few seconds in Google Chrome.

                                        Long story short: I wasn't able to track down true source of this obviously wrong behaviour. But I switched ingress-nginx to disable HTTP 2 and with HTTP 1.1 it worked as expected, instantly serving all requests.

                                        I don't know if it's Google Chrome bug or if it's nginx bug. But I learned my lesson: HTTP 1.1 is good enough and higher versions are not stable yet. HTTP 3 is not even supported in ingress-nginx.

                                        • dilyevsky 1 year ago
                                          > I moved a website to Kubernetes and its performance was absolutely terrible. It was loading for 3 seconds on old infra and now it spends consistently 12 seconds loading.

                                          My guess is it has more to do with resources you probably allocated to your app (especially cpu limit) than any networking overhead which should be negligible in such a trivial setup if done correctly

                                          • mgaunard 1 year ago
                                            Kubernetes makes everything slow and complicated.

                                            Why do you even need to have proxies or load balancers in between? Another invention of the web cloud people.

                                            • xyzzy_plugh 1 year ago
                                              nginx is notoriously bad at not-HTTP 1.1. I wouldn't even bother trying.

                                              Envoy is significantly better in this department.

                                              • apitman 1 year ago
                                                What were your reasons for moving the site to kubernetes?
                                              • varjag 1 year ago
                                                It runs over TCP, you don't need to deal with packet loss.
                                                • vlovich123 1 year ago
                                                  What they’re suggesting is that under packet loss conditions QUIC will outperform TCP due to head of line blocking (particularly when there are multiple assets to fetch). Both TCP and QUIC abstract away packet loss but they have different performance characteristics under those conditions.
                                              • frankjr 1 year ago
                                                It's a mystery, it's almost as if people have spent decades optimizing it.
                                                • mgaunard 1 year ago
                                                  Or rather, it was simply designed correctly.
                                                  • k8svet 1 year ago
                                                    I know you think you're coming off smarter than everyone else, but it's not how it's landing. Turns out things are not that overly reductive to that extent at all in the real world
                                                • foofie 1 year ago
                                                  > HTTP/1 remains the one with the highest bandwidth.

                                                  To be fair, HTTP/2 and HTTP/3 weren't exactly aimed at maximizing bandwidth. They were focused on mitigating the performance constraints of having to spawn dozens of connections to perform the dozens of requests required to open a single webpage.

                                                  • Beldin 1 year ago
                                                    Too bad that the alternative option - not requiring dozens of requests just for initial rendering of a single page - didn't catch on.
                                                    • GuestHNUser 1 year ago
                                                      Couldn't agree more. So many performance problems could be mitigated if people wrote their client/server code to make as few requests as possible.

                                                      Consider the case of requesting a webpage with hundreds of small images, one should embed all of the images into the single webpage! Requiring each image to be fetched in a different http request is ridiculous. It pains me to look at the network tab of modern websites.

                                                      • foofie 1 year ago
                                                        I don't think it's realistic to expect a page load to not make a bunch of requests, considering that you will always have to support use cases involving downloading many small images. Either you handle that by expecting your servers to open a dedicated connection for each download request, or you take steps for that not to be an issue. Even if you presume horizontal scaling could mitigate that problem from the server side, you cannot sidestep the fact that you could simply reuse a single connection to get all your resources, or not require a connection at all.
                                                      • eptcyka 1 year ago
                                                        HTTP3 also wants to minimize latency in bad network environments, not just mitigating the issue of too many requests.
                                                      • kiitos 1 year ago
                                                        Assuming SSL, HTTP/1 does not deliver better throughput than HTTP/2 in general.

                                                        I'm not sure why you believe otherwise. Got any references?

                                                        • drowsspa 1 year ago
                                                          Honestly, one would think that the switch to a binary protocol and then to a different transport layer protocol would be justified by massive gains in performance...
                                                          • kiitos 1 year ago
                                                            It's definitely not a given that a binary protocol + etc. will yield a "massive gain in performance" versus naive e.g. gzipped JSON over HTTP.
                                                            • vlovich123 1 year ago
                                                              The website being tested probably isn’t complicated enough to demonstrate that difference.
                                                              • drowsspa 1 year ago
                                                                Even then, I remember the sales pitches all mentioning performance improvements in the order of about 10-20%
                                                          • apitman 1 year ago
                                                            Very nice. I would love to see some numbers including simulated packet loss. That's theoretically an area h3 would have an advantage.
                                                            • 1 year ago
                                                              • 1vuio0pswjnm7 1 year ago
                                                                Would it be worthwhile to test QUIC using some other TLS library besides OpenSSL, e.g., wolfSSL. I think I read that the the cURL author is working with them, or for them. Apologies if this is incorrect.
                                                                • jupp0r 1 year ago
                                                                  Great writeup, but the diagrams are downright awful. I'd separate the different facets visually to make it easier to see the difference vs those different colors.
                                                                  • superkuh 1 year ago
                                                                    Can cURL's HTTP/3 implementation work with self signed certs? Pretty much every other HTTP/3 lib used by major browsers do not. And since HTTP/3 does not allow for null cypher or TLS-less connections this means in order to establish an HTTP/3 connection a third party CA must be involved.

                                                                    As is right now it is impossible to host a HTTP/3 server visitable by a random person you've never met without a corporate CA continually re-approving your ability to. HTTP/3 is great for corporate needs but it'll be the death of the human web.

                                                                    • adobrawy 1 year ago
                                                                      Given that browsers discourage HTTP traffic (warning that the connection is insecure), given how easily free SSL certificates are available, and given that HTTPS is already the standard on small hobbyist sites, I don't expect The requirement for an SSL certificate has been a blocker in HTTP/3 adoption.
                                                                      • ndriscoll 1 year ago
                                                                        Do browsers warn for http (beyond the address bar icon)? I don't think they ever have for my personal site. I also don't think you can really say there's a "standard" for how hobbyists do things. I'm definitely in the bucket of people who use http because browsers throw up scary warnings if you use a self-signed cert, and scary warnings aren't grandma friendly when I want to send photos of the kids. The benefit of TLS isn't worth setting up publicly signed certs to me, and I don't want to invite the extra traffic by appearing on a CT log.

                                                                        Like the other poster said, it all makes sense for the corporate web. Not so much for the human web. For humans, self-signed certs with automatic TOFU makes sense, but browsers are controlled by and made for the corporate web.

                                                                    • jrpelkonen 1 year ago
                                                                      I really don’t want to criticize anyone or their hard work, and appreciate both curl and OpenSSL as a long time user. That said, I personally find it disappointing that in 2024 major new modules are being written in C. Especially so given that a) existing Quic modules written in Rust exist, and b) there’s a precedent for including Rust code in Curl.

                                                                      Of course there are legacy reasons for maintaining existing codebases, but what is it going to take to shift away from using C for greenfield projects?

                                                                      • apitman 1 year ago
                                                                        Not saying you're wrong, but it's worth noting that switching to Rust is not free. Binary sizes, language complexity, and compile times are all significantly larger.
                                                                        • zinekeller 1 year ago
                                                                          For something like curl (which is also used in embedded systems: a legally-verified (compliant with ISO and other standards, for better or worse) Rust compiler that targets common microarchitectures is a definite first step. Fortunately, the first half of it exists (Ferrocene, https://ferrous-systems.com/ferrocene/). The second one is harder: there are architectures even GCC does not target (these architectures rely on other compilers like the Small Device C Compiler (or a verified variant) or even a proprietary compiler), and LLVM only compiles to a subset of GCC. Even if there's a GCC Rust (currently being developed fortunately), you are still leaving a lot of architectures.
                                                                          • jrpelkonen 1 year ago
                                                                            This is a good point: there are many niche architectures where Rust is not a viable option. But in this specific case, I don’t see these system benefiting from h3/Quic. HOL blocking etc. will rarely, if ever, be a limiting factor for the use cases involved.
                                                                          • 1 year ago
                                                                            • secondcoming 1 year ago
                                                                              I'm personally disappointed you're aware of this issue and have done nothing about it.
                                                                              • teunispeters 1 year ago
                                                                                If rust could support all of C's processors and platforms and produce equivalent sized binaries - especially for embedded ... then it'd be interesting to switch to. (as a start, it also needs a stable and secure ecosystem of tools and libraries)

                                                                                Right now, it's mostly a special purpose language for a narrow range of platforms.

                                                                              • throwaway892238 1 year ago
                                                                                Lol, wait, HTTP2 and HTTP1.1 both trounce HTTP3? Talk about burying the lede. Wasn't performance the whole point behind HTTP3?

                                                                                This chart shows that HTTP2 is more than half as slow as HTTP1.1, and HTTP3 is half as slow as HTTP2. Jesus christ. If these get adopted across the whole web, the whole web's performance could get up to 75% slower . That's insane. There should be giant red flags on these protocols that say "warning: slows down the internet"

                                                                                • zamadatix 1 year ago
                                                                                  If the last decade of web protocol development seems backwards to you after reading one benchmark then why immediately assume it's insane and deserves a warning label instead of asking why your understanding doesn't match your expectations?

                                                                                  The benchmark meant to compare how resource efficient the new backend for curl is by using localhost connectivity. By using localhost connectivity any real world network considerations (such as throughput discovery, loss, latency, jitter, or buffering) are sidestepped to allow a direct measurement of how fast the backend alone is. You can't then assume those numbers have a meaningful direct extrapolation to the actual performance of the web because you don't know how the additional things the newer protocols do impact performance once you add a real network. Ingoring that, you still have to consider the notes like "Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server." before making claims about HTTP2 being more than half as slow as HTTP1.1.

                                                                                  • CharlesW 1 year ago
                                                                                    > Wasn't performance the whole point behind HTTP3?

                                                                                    Faster, more secure, and more reliable, yes. The numbers in this article looks terrible, but real-world testing¹ shows that real-world HTTP/3 performance is quite good, even though implementations are relatively young.

                                                                                    "…we saw substantially higher throughput on HTTP/3 compared to HTTP/2. For example, we saw about 69% of HTTP/3 connections reach a throughput of 5 Mbps or more […] compared to only 56% of HTTP/2 connections. In practice, this means that the video streams will be of a higher visual quality, and/or have fewer stalls over HTTP/3."

                                                                                    ¹https://pulse.internetsociety.org/blog/measuring-http-3-real...

                                                                                  • jgalt212 1 year ago
                                                                                    Does Curl performance really matter? i.e. if it's too performant, doesn't that increase the odds your spider is blocked? Of course, if you're sharding horizontally across targets, then any performance increase is appreciated.
                                                                                    • j16sdiz 1 year ago
                                                                                      libcurl is the backend for many (RESTful) API library.

                                                                                      Improving upload throughput to S3 bucket would be great, right?

                                                                                    • zamadatix 1 year ago
                                                                                      What if you're not using curl as a spider? Even if you are I'd recommend some other spider design which doesn't rely on the performance of curl to set the crawling rate.