Curl HTTP/3 Performance
157 points by BitPirate 1 year ago | 118 comments- hlandau 1 year agoAuthor of the OpenSSL QUIC stack here. Great writeup.
TBQH, I'm actually really pleased with these performance figures - we haven't had time yet to do this kind of profiling or make any optimisations. So what we're seeing here is the performance prior to any kind of concerted measurement or optimisation effort on our part. In that context I'm actually very pleasantly surprised at how close things are to existing, more mature implementations in some of these benchmarks. Of course there's now plenty of tuning and optimisation work to be done to close this gap.
- apitman 1 year agoI'm curious if you've architected it in such a way that it lends itself to optimization in the future? I'd love to hear more about how these sorts of things are planned, especially in large C projects.
- hlandau 1 year agoAs much as possible, yes.
With something like QUIC "optimisation" breaks down into two areas: performance tuning in terms of algorithms, and tuning for throughput or latency in terms of how the protocol is used.
The first part is actually not the major issue, at least in our design everything is pretty efficient and designed to avoid unnecessary copying. Most of the optimisation I'm talking about above is not about things like CPU usage but things like tuning loss detection, congestion control and how to schedule different types of data into different packets. In other words, a question of tuning to make more optimal decisions in terms of how to use the network, as opposed to reducing the execution time of some algorithm. These aren't QUIC specific issues but largely intrinsic to the process of developing a transport protocol implementation.
It is true that QUIC is intrinsically less efficient than say, TCP+TLS in terms of CPU load. There are various reasons for this, but one is that QUIC performs encryption per packet, whereas TLS performs encryption per TLS record, where one record can be larger than one packet (which is limited by the MTU). I believe there's some discussion ongoing on possible ways to improve on this.
There are also features which can be added to enhance performance, like UDP GSO, or extensions like the currently in development ACK frequency proposal.
- Matthias247 1 year agoActually the benchmarks just measure the first part (cpu efficiency) since it’s a localhost benchmark. The gap will be most likely due to missing GSO if it’s not implemented. Its such a huge difference, and pretty much the only thing which can prevent QUIC from being totally inefficient.
- apitman 1 year agoThank you for the details!
- Matthias247 1 year ago
- hlandau 1 year ago
- benreesman 1 year agoThank you kindly for your work. These protocols are critically important and more the more high-quality and open implementations exist the more likely they are to be free and inclusive.
Also, hat tip for such competitive performance on an untuned implementation.
- spullara 1 year agoAre there good reasons to use HTTP3/QUIC that aren't based on performance?
- zamadatix 1 year agoI suppose that depends on your definitions of "good" and what counts as being "based on performance". For instance QUIC and HTTP/3 support better reliability via things like FEC and connection migration. You can resume a session on a different network (think going from Wi-Fi to cellular or similar) instead of recreating the session and FEC can make the delivery of messages more reliable. At the same time you could argue both of these ultimately just impact performance depending on how you choose to measure them.
Something more agreeably not performance based is the security is better. E.g. more of the conversation is enclosed in encryption at the protocol layer. Whether that's a good reason depends on who you ask though.
- Matthias247 1 year agoWe need to distinguish between performance (throughput over a congested/lossy connection) and efficiency (cpu and memory usage). Quic can achieve higher performance, but will always be less efficient. The linked benchmark actually just measures efficiency since it’s about sending data over loopback on the same host
- jzwinck 1 year agoWhat makes QUIC less efficient in CPU and memory usage?
- jzwinck 1 year ago
- o11c 1 year agoTCP has at least one unfixable security exploit: literally anybody on the network can reset your connection. Availability is 1/3 of security, remember.
- zamadatix 1 year ago
- foofie 1 year agoHow awesome is that? Thank you for all your harf work. It's thanks to people such as yourself that the whole world keeps on working.
Obligatory:
- apitman 1 year ago
- vitus 1 year agoIt is promising to see that openssl-quic serial throughput is within 10-20% of more mature implementations such as quiche. (Which quiche, though? Is this Google's quiche, written in C++, or Cloudflare's quiche, written in Rust? It turns out that's approximately the only word that starts with "quic" that isn't a derivative of "quick".)
One of QUIC's weaknesses is that it's known to be much less CPU efficient, largely due to the lack of things like HW offload for TLS.
> Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server.
To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests. You'll see connection pooling of, uh, 6 (at least for Chrome and Firefox), so the problems of head-of-line blocking that HTTP/2 and HTTP/3 attempt to solve would have manifested in more realistic benchmarks.
Some questions I have:
- What kind of CPU is in use? How much actual hw parallelism do you have in practice?
- Are these requests actually going over the network (even a LAN)? What's the MTU?
- How many trials went into each of these graphs? What are the error bars on these?
- jsty 1 year agoLooks like Cloudflare quiche:
https://github.com/curl/curl/blob/0f4c19b66ad5c646ebc3c4268a...
- pclmulqdq 1 year agoHardware offload should be protocol-independent, but I suppose most network cards assume some stuff about TLS and aren't set up for QUIC?
- Matthias247 1 year agoNICs assume stuff for TCP (segmentation offload) that they can’t do for UDP, or can only do in a very limited fashion (GSO).
TLS offloads are very niche. There’s barely anyone using them in production, and the benchmarks are very likely without
- Matthias247 1 year ago
- ndriscoll 1 year ago> To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests.
They will. You just need to go bump that number in the settings. :-)
- secondcoming 1 year agoBrowsers aren't the only things that connect to servers that speak HTTP.
- jsty 1 year ago
- BitPirate 1 year agoThe performance difference between H1/H2 and H3 in this test doesn't really surprise me. The obvious part is the highly optimised TCP stack. But I fear that the benchmark setup itself might be a bit flawed.
The biggest factor is the caddy version used for the benchmark. The quic-go library in caddy v2.6.2 lacks GSO support, which is crucial to avoid high syscall overhead.
The quic-go version in caddy v2.6.2 also doesn't adjust UDP buffer sizes.
The other thing that's not clear from the blog post is the network path used. Running the benchmark over loopback only would give TCP-based protocols an advantage if the QUIC library doesn't support MTU discovery.
- Etheryte 1 year agoI don't think taking shots at the Caddy version being not the latest is a fair criticism to be honest. Version 2.6.2 was released roughly three months ago, so it's not like we're talking about anything severely outdated, most servers you run into in the wild will be running something older than that.
- zamadatix 1 year agoI think you mixed up what year we're now :). Caddy 2.6.2 October 13, 2022 so it's been not 3 but 15 months since release.
Even more relevantly, HTTP/3 was first supported out of the box in 2.6.0 - released Sep 20, 2022. Even if 2.6.2 had been just 3 months old that it's from the first 22 days of having HTTP/3 support out of the box instead of the versions from the following 3 months would definitely be relevant criticism to note.
- francislavoie 1 year agoThis is why I'm not a fan of debian. (I assume OP got that version from debian because I can't think of any other reason they wouldn't have used latest.) They packaged Caddy, but they never update at the rate we would reasonably expect. So users who don't pay attention to the version number have a significantly worse product than is currently available.
We have our own apt repo which always has the latest version: https://caddyserver.com/docs/install#debian-ubuntu-raspbian
- Etheryte 1 year agoOh, right you are, somehow I completely mixed that up. Thanks for clarifying.
- francislavoie 1 year ago
- zamadatix 1 year ago
- Etheryte 1 year ago
- nezirus 1 year agoMaybe shout out to HAProxy people, like many they've observed performance problems with OpenSSL 3.x series. But having good old OpenSSL with QUIC would be so convenient for distro packages etc
https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-S...
- samueloph 1 year agoNice write-up.
I'm one of the Debian maintainers of curl and we are close to enabling http3 on the gnutls libcurl we ship.
We have also started discussing the plan for enabling http3 on the curl CLI in time for the next stable release.
Right now the only option is to switch the CLI to use the gnutls libcurl, but looks like it might be possible to stay with openssl, depending on when non-experimental support lands and how good openssl's implementation is.
- pabs3 1 year agoAny chance of WebSocket being enabled too?
- samueloph 1 year agoThat's still an experimental feature on curl's side so I'm not sure. https://everything.curl.dev/helpers/ws/support
- samueloph 1 year ago
- mistrial9 1 year agomaybe the right time to clean up the unexpected and awkward set of libs that are currently installed, too ?
- pabs3 1 year ago
- londons_explore 1 year agoAnyone else disappointed that the figures for localhost are in MB/s not GB/s?
The whole lot just seems an order of magnitude slower than I was hoping to see.
- zamadatix 1 year agoA core of the 4770 (curl is single threaded) can't even manage a full order of magnitude more plain AES encryption throughput - ignoring it also has to be done into small packets and decrypted on the same machine.
- zamadatix 1 year ago
- mgaunard 1 year agoHTTP/1 remains the one with the highest bandwidth.
No surprise here.
- 1vuio0pswjnm7 1 year agoIt comes from a time before websites sucked because they are overloaded with ads ads and tracking.
For non-interactively retrieving a single page of HTML, or some other resource, such as a video, or retrieving 100 pages, or 100 videos, in a single TCP connection, without any ads or tracking, HTTP/3 is overkill. It's more complex and it's slower than HTTP/1.0 and HTTP/1.1.
- sylware 1 year agoI have a domestic web server, I did implement its code, and the most important was HTTP1.[01] to be very simple to implement, that to lower the cost of implementing my real-life HTTP1.[01] alternative (we all know Big Tech does not like that...).
The best would be to have something like SMTP: the core is extremely simple and yet real-life-works everywhere, and via announced options/extensions it can _optionnaly_ grow in complexity.
- sylware 1 year ago
- BitPirate 1 year agoIt's a bit like drag racing. If all you care about is the performance of a single transfer that doesn't have to deal with packet loss etc, HTTP/1 will win.
- vbezhenar 1 year agoYesterday I was trying to track weird bug. I moved a website to Kubernetes and its performance was absolutely terrible. It was loading for 3 seconds on old infra and now it spends consistently 12 seconds loading.
Google Chrome shows that 6 requests require 2-3 seconds to complete simultaneously. 3 of those requests are tiny static files served by nginx, 3 of those requests are very simple DB queries. Each request completes in few milliseconds using curl, but few seconds in Google Chrome.
Long story short: I wasn't able to track down true source of this obviously wrong behaviour. But I switched ingress-nginx to disable HTTP 2 and with HTTP 1.1 it worked as expected, instantly serving all requests.
I don't know if it's Google Chrome bug or if it's nginx bug. But I learned my lesson: HTTP 1.1 is good enough and higher versions are not stable yet. HTTP 3 is not even supported in ingress-nginx.
- dilyevsky 1 year ago> I moved a website to Kubernetes and its performance was absolutely terrible. It was loading for 3 seconds on old infra and now it spends consistently 12 seconds loading.
My guess is it has more to do with resources you probably allocated to your app (especially cpu limit) than any networking overhead which should be negligible in such a trivial setup if done correctly
- mgaunard 1 year agoKubernetes makes everything slow and complicated.
Why do you even need to have proxies or load balancers in between? Another invention of the web cloud people.
- xyzzy_plugh 1 year agonginx is notoriously bad at not-HTTP 1.1. I wouldn't even bother trying.
Envoy is significantly better in this department.
- apitman 1 year agoWhat were your reasons for moving the site to kubernetes?
- dilyevsky 1 year ago
- varjag 1 year agoIt runs over TCP, you don't need to deal with packet loss.
- vlovich123 1 year agoWhat they’re suggesting is that under packet loss conditions QUIC will outperform TCP due to head of line blocking (particularly when there are multiple assets to fetch). Both TCP and QUIC abstract away packet loss but they have different performance characteristics under those conditions.
- vlovich123 1 year ago
- vbezhenar 1 year ago
- frankjr 1 year agoIt's a mystery, it's almost as if people have spent decades optimizing it.
- foofie 1 year ago> HTTP/1 remains the one with the highest bandwidth.
To be fair, HTTP/2 and HTTP/3 weren't exactly aimed at maximizing bandwidth. They were focused on mitigating the performance constraints of having to spawn dozens of connections to perform the dozens of requests required to open a single webpage.
- Beldin 1 year agoToo bad that the alternative option - not requiring dozens of requests just for initial rendering of a single page - didn't catch on.
- GuestHNUser 1 year agoCouldn't agree more. So many performance problems could be mitigated if people wrote their client/server code to make as few requests as possible.
Consider the case of requesting a webpage with hundreds of small images, one should embed all of the images into the single webpage! Requiring each image to be fetched in a different http request is ridiculous. It pains me to look at the network tab of modern websites.
- foofie 1 year agoI don't think it's realistic to expect a page load to not make a bunch of requests, considering that you will always have to support use cases involving downloading many small images. Either you handle that by expecting your servers to open a dedicated connection for each download request, or you take steps for that not to be an issue. Even if you presume horizontal scaling could mitigate that problem from the server side, you cannot sidestep the fact that you could simply reuse a single connection to get all your resources, or not require a connection at all.
- GuestHNUser 1 year ago
- eptcyka 1 year agoHTTP3 also wants to minimize latency in bad network environments, not just mitigating the issue of too many requests.
- Beldin 1 year ago
- kiitos 1 year agoAssuming SSL, HTTP/1 does not deliver better throughput than HTTP/2 in general.
I'm not sure why you believe otherwise. Got any references?
- drowsspa 1 year agoHonestly, one would think that the switch to a binary protocol and then to a different transport layer protocol would be justified by massive gains in performance...
- kiitos 1 year agoIt's definitely not a given that a binary protocol + etc. will yield a "massive gain in performance" versus naive e.g. gzipped JSON over HTTP.
- vlovich123 1 year agoThe website being tested probably isn’t complicated enough to demonstrate that difference.
- drowsspa 1 year agoEven then, I remember the sales pitches all mentioning performance improvements in the order of about 10-20%
- drowsspa 1 year ago
- kiitos 1 year ago
- 1vuio0pswjnm7 1 year ago
- apitman 1 year agoVery nice. I would love to see some numbers including simulated packet loss. That's theoretically an area h3 would have an advantage.
- 1 year ago
- 1vuio0pswjnm7 1 year agoWould it be worthwhile to test QUIC using some other TLS library besides OpenSSL, e.g., wolfSSL. I think I read that the the cURL author is working with them, or for them. Apologies if this is incorrect.
- jupp0r 1 year agoGreat writeup, but the diagrams are downright awful. I'd separate the different facets visually to make it easier to see the difference vs those different colors.
- superkuh 1 year agoCan cURL's HTTP/3 implementation work with self signed certs? Pretty much every other HTTP/3 lib used by major browsers do not. And since HTTP/3 does not allow for null cypher or TLS-less connections this means in order to establish an HTTP/3 connection a third party CA must be involved.
As is right now it is impossible to host a HTTP/3 server visitable by a random person you've never met without a corporate CA continually re-approving your ability to. HTTP/3 is great for corporate needs but it'll be the death of the human web.
- adobrawy 1 year agoGiven that browsers discourage HTTP traffic (warning that the connection is insecure), given how easily free SSL certificates are available, and given that HTTPS is already the standard on small hobbyist sites, I don't expect The requirement for an SSL certificate has been a blocker in HTTP/3 adoption.
- ndriscoll 1 year agoDo browsers warn for http (beyond the address bar icon)? I don't think they ever have for my personal site. I also don't think you can really say there's a "standard" for how hobbyists do things. I'm definitely in the bucket of people who use http because browsers throw up scary warnings if you use a self-signed cert, and scary warnings aren't grandma friendly when I want to send photos of the kids. The benefit of TLS isn't worth setting up publicly signed certs to me, and I don't want to invite the extra traffic by appearing on a CT log.
Like the other poster said, it all makes sense for the corporate web. Not so much for the human web. For humans, self-signed certs with automatic TOFU makes sense, but browsers are controlled by and made for the corporate web.
- ndriscoll 1 year ago
- adobrawy 1 year ago
- jrpelkonen 1 year agoI really don’t want to criticize anyone or their hard work, and appreciate both curl and OpenSSL as a long time user. That said, I personally find it disappointing that in 2024 major new modules are being written in C. Especially so given that a) existing Quic modules written in Rust exist, and b) there’s a precedent for including Rust code in Curl.
Of course there are legacy reasons for maintaining existing codebases, but what is it going to take to shift away from using C for greenfield projects?
- apitman 1 year agoNot saying you're wrong, but it's worth noting that switching to Rust is not free. Binary sizes, language complexity, and compile times are all significantly larger.
- zinekeller 1 year agoFor something like curl (which is also used in embedded systems: a legally-verified (compliant with ISO and other standards, for better or worse) Rust compiler that targets common microarchitectures is a definite first step. Fortunately, the first half of it exists (Ferrocene, https://ferrous-systems.com/ferrocene/). The second one is harder: there are architectures even GCC does not target (these architectures rely on other compilers like the Small Device C Compiler (or a verified variant) or even a proprietary compiler), and LLVM only compiles to a subset of GCC. Even if there's a GCC Rust (currently being developed fortunately), you are still leaving a lot of architectures.
- jrpelkonen 1 year agoThis is a good point: there are many niche architectures where Rust is not a viable option. But in this specific case, I don’t see these system benefiting from h3/Quic. HOL blocking etc. will rarely, if ever, be a limiting factor for the use cases involved.
- jrpelkonen 1 year ago
- 1 year ago
- secondcoming 1 year agoI'm personally disappointed you're aware of this issue and have done nothing about it.
- teunispeters 1 year agoIf rust could support all of C's processors and platforms and produce equivalent sized binaries - especially for embedded ... then it'd be interesting to switch to. (as a start, it also needs a stable and secure ecosystem of tools and libraries)
Right now, it's mostly a special purpose language for a narrow range of platforms.
- apitman 1 year ago
- throwaway892238 1 year agoLol, wait, HTTP2 and HTTP1.1 both trounce HTTP3? Talk about burying the lede. Wasn't performance the whole point behind HTTP3?
This chart shows that HTTP2 is more than half as slow as HTTP1.1, and HTTP3 is half as slow as HTTP2. Jesus christ. If these get adopted across the whole web, the whole web's performance could get up to 75% slower . That's insane. There should be giant red flags on these protocols that say "warning: slows down the internet"
- zamadatix 1 year agoIf the last decade of web protocol development seems backwards to you after reading one benchmark then why immediately assume it's insane and deserves a warning label instead of asking why your understanding doesn't match your expectations?
The benchmark meant to compare how resource efficient the new backend for curl is by using localhost connectivity. By using localhost connectivity any real world network considerations (such as throughput discovery, loss, latency, jitter, or buffering) are sidestepped to allow a direct measurement of how fast the backend alone is. You can't then assume those numbers have a meaningful direct extrapolation to the actual performance of the web because you don't know how the additional things the newer protocols do impact performance once you add a real network. Ingoring that, you still have to consider the notes like "Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server." before making claims about HTTP2 being more than half as slow as HTTP1.1.
- CharlesW 1 year ago> Wasn't performance the whole point behind HTTP3?
Faster, more secure, and more reliable, yes. The numbers in this article looks terrible, but real-world testing¹ shows that real-world HTTP/3 performance is quite good, even though implementations are relatively young.
"…we saw substantially higher throughput on HTTP/3 compared to HTTP/2. For example, we saw about 69% of HTTP/3 connections reach a throughput of 5 Mbps or more […] compared to only 56% of HTTP/2 connections. In practice, this means that the video streams will be of a higher visual quality, and/or have fewer stalls over HTTP/3."
¹https://pulse.internetsociety.org/blog/measuring-http-3-real...
- zamadatix 1 year ago
- jgalt212 1 year agoDoes Curl performance really matter? i.e. if it's too performant, doesn't that increase the odds your spider is blocked? Of course, if you're sharding horizontally across targets, then any performance increase is appreciated.
- j16sdiz 1 year agolibcurl is the backend for many (RESTful) API library.
Improving upload throughput to S3 bucket would be great, right?
- backum 1 year ago[flagged]
- backum 1 year ago
- zamadatix 1 year agoWhat if you're not using curl as a spider? Even if you are I'd recommend some other spider design which doesn't rely on the performance of curl to set the crawling rate.
- j16sdiz 1 year ago