Silkenweb Example: Hackernews Clone

A tiny Docker image to serve static websites

374 points by nilsandrey 3 years ago | 150 comments

KronisLV 3 years ago
> My first attempt uses the small alpine image, which already packages thttpd:
```
  # Install thttpd
  RUN apk add thttpd
```
Wouldn't you want to use the --no-cache option with apk, e.g.:
```
  RUN apk add --no-cache thttpd
```
It seems to slightly help with the container size:
```
  REPOSITORY       TAG       IMAGE ID       CREATED          SIZE
  thttpd-nocache   latest    4a5a1877de5d   7 seconds ago    5.79MB
  thttpd-regular   latest    655febf218ff   41 seconds ago   7.78MB
```
It's a bit like cleaning up after yourself with apt based container builds as well, for example (although this might not always be necessary):
```
  # Apache web server
  RUN apt-get update && apt-get install -y apache2 libapache2-mod-security2 && apt-get clean && rm -rf /var/lib/apt/lists /var/cache/apt/archives
```
But hey, that's an interesting goal to pursue! Even though personally i just gave up on Alpine and similar slim solutions and decided to just base all my containers on Ubuntu instead: https://blog.kronis.dev/articles/using-ubuntu-as-the-base-fo...
tadbit 3 years ago
I love stuff like this.
People will remark about how this is a waste of time, others will say it is absolutely necessary, even more will laud it just for the fun of doing it. I'm in the middle camp. I wish software/systems engineers would spend more time optomising for size and performance.
- memish 3 years ago
  Wouldn't removing Docker entirely be a good optimization?
  - kube-system 3 years ago
    Docker adds other value to the lifecycle of your deployment. An "optimization" where you're removing value is just a compromise. Otherwise we'd all run our static sites on UEFI.
    - jart 3 years ago
      Redbean supports UEFI too. Although we haven't added a bare metal implementation of berkeley sockets yet. Although it's on the roadmap for the future.
    - ar_lan 3 years ago
      This is a really good point, and something I think a lot of people forget. It's true, the most secure web app is one written with no code/no OS/does nothing.
      Adding value is a compromise of some increased security risk - and it's our job to mitigate that as much as possible by writing quality software.
    - vanviegen 3 years ago
      What value is that, for running such a simple piece of software?
    - jamal-kumar 3 years ago
      yeah see some of us still do this on OSes that haven't turned into a giant bloated hodgepodge of security theatre and false panacea software.
      docker has dead whale on the beach vibes. what value does it offer to those of us who have moved on from the mess linux is becoming?
  - throwanem 3 years ago
    In terms of CPU cycles and disk space, maybe. In terms of engineer cycles, absolutely not. Which costs more?
    - danuker 3 years ago
      Hmm, a SCP shell script on my laptop, prompting my SSH key's password and deploying the site to the target machine?
      Or a constantly-updating behemoth, running as root, installing packages from yet another unauditable repository chain?
    - marginalia_nu 3 years ago
      Building simpler systems allows you to save on all three.
    - 3 years ago
- qbasic_forever 3 years ago
  I think the real value is just focusing on the absolute minimum necessary software in a production docker/container image. It's a good practice for security with less surface area for attackers to target.
- encryptluks2 3 years ago
  The difference between a systems engineer and a software engineer is that to a systems engineer a half functioning 5MB docker image is okay but to a software engineer a fully functional 5GB Node image is fine.
  - 3 years ago
  - ttty 3 years ago
    Premature optimisation? 5 gb doesn’t matter. It’s not great, don’t get me wrong.
mr-karan 3 years ago
While this is remarkably a good hack and I did learn quite a bit after reading the post, I'm simply curious about the motivation behind it? A docker image even if it's a few MBs with Caddy/NGINX should ideally be just pulled once on the host and sit there cached. Assuming this is OP's personal server and there's not much churn, this image could be in the cache forever until the new tag is pushed/pulled. So, from a "hack" perspective, I totally get it, but from a bit more pragmatic POV, I'm not quite sure.
- throwaway894345 3 years ago
  It gets pulled once per host, but with autoscaling hosts come and go pretty frequently. It's a really nice property to be able to scale quickly with load, and small images tend to help with this in a variety of ways (pulling but also instantiating the container). Most sites won't need to scale like this; however, because one or two hosts is almost always sufficient for all traffic the site will ever receive.
  - mr-karan 3 years ago
    I did mention that it's the OP's server which I presume isn't in an autoscale group.
    Even then, saving a few MBs in image size is the devops parlance of early optimisation.
    There's so much that happens in an Autoscale group before the instance is marked healthy to serve traffic, that an image pull of few MBs in the grand scheme of things is hardly ever any issue to focus on.
    - throwaway894345 3 years ago
      Yeah, like I said, I'm not defending this image in particular--most static sites aren't going to be very sensitive to autoscaling concerns. I was responding generally to your reasoning of "the host will just cache the image" which is often used to justify big images which in turn creates a lot of other (often pernicious) problems. To wit, with FaaS, autoscaling is highly optimized and tens of MBs can make a significant difference in latency.
  - tuananh 3 years ago
    could be very useful in serverless space as lambda do support container image now. the image will be pulled much more often.
- marginalia_nu 3 years ago
  The less resources you use from your system, the more things you can do with your system.
  - spicybright 3 years ago
    Only matters if you're actually using those extra cycles or not. The majority of web servers hover at <10% CPU just waiting for connections.
    - munk-a 3 years ago
      I don't know if that's really true - if you're renting the server from a cloud provider chances are you can bump down the instance size if you don't need the extra processing capacity... and if it's a server you manually maintain I think lighter usage generally decreases part attrition, though the other factors in that are quite complex.
- rektide 3 years ago
  I feel like there's a lot of low-hanging fruit on the table for containers, and it's weird we don't try to optimize loading. I could be wrong! This seems like a great sample use case- wanting a fast/low-impact simple webserver for any of a hundred odd purposes. Imo there's a lot of good strategies available for making starting significantly larger containers very fast!
  We could be using container snapshots/checkpoints so we don't need to go through as much initialization code. This would imply though that we configure via the file-system or something we can attach late though. Instead of 12-factor configure via env vars, as is standard/accepted convention these days. Actually I suppose environment variables are writable, but the webserver would need to be able to re-read it's config, accept a SIGHUP or whatever.
  We could try to pin some specific snapshots into memory. Hopefully Linux will keep any frequently booted-off snapshot cached, but we could try & go further & try to make sure hosts have the snapshot image in memory at all times.
  I want to think that common overlay systems like overlayfs or btrfs or whatever will do a good job of making sure, if everyone is asking for the same container, they're sharing some caches effectively. Validating & making sure would be great to see. To be honest I'm actually worried the need-for-speed attempt to snapshot/checkpoint a container & re-launch it might conflict somewhat- rather than creating a container fs from existing pieces & launching a process, mapped to that fs, i'm afraid the process snapshot might reencode the binary? Maybe? We'd keep getting to read from the snapshot I guess, which is good, but there'd be some duplication of the executable code across the container image and then again in the snapshotted process image.
kra34 3 years ago
I love it! Can you add SSL though? Does it support gzip compression? What about Brotli? I like that it's small and fast so in addition to serving static files can it act as a reverse proxy? What about configuration? I'd like to be able to server multiple folders instead of just one?
Where can I submit a feature request ticket?
- stevefan1999 3 years ago
  https://github.com/weihanglo/sfz
  check this out
  - CameronNemo 3 years ago
    This seems to be intended for local host usage exclusively. Is anyone using this for public or even internal http hosting?
- nilsandrey 3 years ago
  I think is not precisely an active project, but it's open on GitHub[1], I guess we can try to open issues there.
  - [1] https://github.com/lipanski/docker-static-website
0xbadcafebee 3 years ago
If you use "-Os" instead of "-O2", you save 8kB!
However, Busybox also comes with an httpd... it may be 8.8x bigger, but you also get that entire assortment of apps to let you troubleshoot, run commands in an entrypoint, run commands from the httpd/cgi, etc. I wouldn't run it in production.... but it does work :)
kissgyorgy 3 years ago
Redbean is just 155Kb without the need for alpine or any other dependency. You just copy the Redbean binary and your static assets, no complicated build steps and hundred MB download necessary. Check it out: https://github.com/kissgyorgy/redbean-docker
- mrweasel 3 years ago
  There's also the 6kB container, which uses asmttpd, a webserver written in assembler.
  https://devopsdirective.com/posts/2021/04/tiny-container-ima...
- danuker 3 years ago
  Wow! This is the Redbean which is an "Actually Portable Executable", or a binary that can run on a range of OSes (Linux, Windows, MacOS, BSDs).
  http://justine.lol/ape.html
  - adolph 3 years ago
    Well worth a read:
    I believe the best chance we have of [building binaries "to stand the test of time with minimal toil"], is by gluing together the binary interfaces that've already achieved a decades-long consensus, and ignoring the APIs. . . . Platforms can't break them without breaking themselves.
- tyingq 3 years ago
  And it does https/tls, where thttpd does not.
  - somenewaccount1 3 years ago
    I'm confused how the author considers thttpd more 'battle tested' if it doesn't resolve https.
    Either way though, it's a great article I'm glad the author took to write. His docker practices are wonderful, wish more engineers would use them.
    - SahAssar 3 years ago
      The term 'battle tested' has nothing to do with amount of features, it's about how proven the stability and/or security of the included features included are. The term also usually carries a heavy weight towards older systems that have been used in production for a long time since those have had more time to weather bugs that are only caught in real-world use.
    - cassandratt 3 years ago
      "Battle tested" typically means that the code has been running for a long time, bugs found, bugs squashed, and a stability has been attained for a long time. It's usage predates the "information wars", back when we really didn't think about security that much because nothing was connected to anything else that went outside the companies, so there were no hackers or security battles back then. So I suspect this is the authors frame of reference.
mg 3 years ago
For static websites, is there any reason not to host them on GitHub?
Since GitHub Pages lets you attach a custom domain, it seems like the perfect choice.
I would expect their CDN to be pretty awesome. And updating the website with a simple git push seems convenient.
- naet 3 years ago
  Once you've used a couple more static hosts you'll find that gh pages is a second tier host at best. Lacks some basic configuration options and toolings, can be very slow to update or deploy, the cdn actually isn't as good as others, etc. Github pages is great for hobby projects and if you're happy with it by all means keep using it... but I wouldn't ever set up a client's production site on it.
  If you're curious, Netlify is one popular alternative that is easy to get in to even without much experience. I would say even at the free tiers Netlify is easily a cut above Github for static hosting, and it hooks into github near perfect straight out of the box if that is something you value.
- jason0597 3 years ago
  > is there any reason not to host them on GitHub?
  Because some people may not want to depend even more on Big Tech (i.e. Microsoft) than they already do
- _-david-_ 3 years ago
  >For static websites, is there any reason not to host them on GitHub?
  One reason would be if your site violates the TOS or acceptable use policy. GitHub bans "excessive bandwidth" without defining what that is for example. For a small blog about technology you are probably fine.
- marginalia_nu 3 years ago
  Wanting to own your own web presence is reason not to host them on GitHub.
  For static websites, CDNs are largely unnecessary. My potato of a website hosted from a computer in my living room has been on the front page of HN several times without as much as increasing its fan speed.
  It took Elon Musk tweeting a link to one of my blog posts before it started struggling to serve pages. I think it ran out of file descriptors, but I've increased that limit now.
  - flatiron 3 years ago
    Was it through a VPN? I feel like revealing my home IP to random people on the internet is a bad move.
    - marginalia_nu 3 years ago
      No VPN. I have the ports fairly well tightened down, though. I'm exposed to a zero-day in iptables itself or something, but whatever. Even if someone got in it would be an inconvenience at worst. It's not like I'm making money off this stuff.
  - lostlogin 3 years ago
    Are you able to describe how you run yours? I scummed your blog but didn’t see anything about it.
    - marginalia_nu 3 years ago
      The static content is just nginx loading files straight off a filesystem. The dynamic content (e.g. the search engine) is nginx forwarding requests to my Java-based backend.
    - Dma54rhs 3 years ago
      You can serve massive amount of static requests from any potato really, it's a solved problem.
- throwaway894345 3 years ago
  I'm sure their CDN is great, and I've used it in the past; however, I like to self-host as a hobby.
- enriquto 3 years ago
  > For static websites, is there any reason not to host them on GitHub?
  I don't like github pages because it's quite slow to deploy. Sometimes it takes more than a couple of minutes just to update a small file after the git push.
- qbasic_forever 3 years ago
  I don't think you can set a page or URL on github to return a 301 moved permanently response or similar 3xx codes. This can really mess up your SEO if you have a popular page and try to move off github, you'll basically lose all the clout on the URL and have to start fresh. It might not matter for stuff you're just tossing out there but is definitely something to consider if you're putting a blog, public facing site, etc. there.
  - nobodywasishere 3 years ago
    I have a few 301 redirects setup on github pages
```
    $ curl https://nobodywasishere.github.io # moved to https://blog.eowyn.net

    <html>
    <head><title>301 Moved Permanently</title></head>
    <body>
    <center><h1>301 Moved Permanently</h1></center>
    <hr><center>nginx</center>
    </body>
    </html>

    $ curl https://blog.eowyn.net/vhdlref-jtd # moved to https://blog.eowyn.net/vhdlref

    <html>
    <head><title>301 Moved Permanently</title></head>
    <body>
    <center><h1>301 Moved Permanently</h1></center>
    <hr><center>nginx</center>
    </body>
    </html>
```
    - qbasic_forever 3 years ago
      Is that coming back with a HTTP 200 response though and the made up HTML page? That doesn't seem right... at least, I dunno if google and such would actually index your page at the new URL vs. just thinking "huh weird looks like blog.eowyn.net is now called '301 Moved Permanently', better trash that down in the rankings".
    - chabad360 3 years ago
      Yea, no.
      A 301 (or 302) redirect means setting the status code header to 301 and providing a location header with the place to redirect to. Last I checked GitHub doesn't allow any of this, or setting any other headers (like cache-control). To work around this, I've been putting cloudflare in front of my site which lets me use page rules to set redirects if necessary.
- marban 3 years ago
  Netlify FTW — For the rewrite rules alone.
- coding123 3 years ago
  Well, not everything is open source.
- tekromancr 3 years ago
  Can you do SSL?
  - dewey 3 years ago
    Yes, since 2018.
jandeboevrie 3 years ago
But why would you prefer Docker like this over, for example, running thttpd directly? Saves you a lot of Ram an indirection?
- qbasic_forever 3 years ago
  Run this on a linux host and it isn't that much different from running thttpd directly. There's just some extra chroot, cgroups, etc. setup done before launching the process but none of that gets in the way once it's running. Docker adds a bit of networking complexity and isolation, but even that is easily disabled with a host network CLI flag.
  It's really only on windows/mac where docker has significant memory overhead, and that's just because it has to run a little VM with a linux kernel. You'd have the same issue if you tried to run thttpd there too and couldn't find a native mac/windows binary.
- somenewaccount1 3 years ago
  For one, because his home server provides multiple utilities, not just this one project, and without docker he starts to have dependency conflicts.
  He also like to upgrade that server close to edge, and if that goes south, he want to rebuild and bring his static site up quickly, along with his other projects.
  - gotaquestion 3 years ago
    I serve several sites off an AWS EC2 instance, all are dynamic REST endpoints with DBs in their own `tmux` instance. I also have a five line nodeJS process running on another port for just my static page. All of this is redirected from AWS/r53/ELB. The only pain in the arse is setting up all the different ports, but everything runs in its own directory so there are no dependency issues. I've tried to ramp up with docker, but I always end up finding it faster to just hack out a solution like this (plus it saves disk space and memory on my local dev machine). In the end my sol'n is still a hack since every site is on one machine, but these are just sites for my own fun. Perhaps running containers directly would be easier, but I haven't figured out how to deal with disk space (since I upload lots of stuff).
  - Yeroc 3 years ago
    Well in the article he ended up compiling thttpd statically so he wouldn't have dependency conflicts if he ran it directly. Funny how there's overlap in docker solutions that solve different but related issues for non-docker deploys as well...
    - hedora 3 years ago
      Without docker, he'd need to install build dependencies on the host. Once it is in docker, why move it out?
- ttty 3 years ago
  I don’t want to touch the root of my server. I rather add a new container that doesn’t modify anything on the root.
  Benefits: can cleanly and delete 100% of what was installed. If you use something on root can always infect, save cache, logs…
  I don’t want to impact anything else running on my server. I don’t want anything to depend on that either silently.
  Docker is the best thing. I just can’t understand how people still can’t get the benefits yet.
  Is Amazing to start a project you had 3 years ago and just works and you can deploy without reading any docs. Just spin a docker container. Eat, safe and just works.
zahllos 3 years ago
The only thing I would change: I would use Caddy and not thttpd. This way the actual binary doing the serving is memory-safe. It may well require more disk space, but it is a worthwhile tradeoff I think. You can also serve over TLS this way.
bachmitre 3 years ago
How many requests can thttpd handle simultaneously, compared to, say nginx ? It's a moo point being small if you then have to instantiate multiple containers behind a load balancer to handle simultaneous requests.
0xb0565e487 3 years ago
I don't know why there is a big fish at the top of your website, but I like it a lot.
- adolph 3 years ago
  Agreed. GIS says at least some are from the NYPL:
  https://nypl.getarchive.net/media/salmo-fario-the-brown-trou...
- ludwigvan 3 years ago
  Me too!
  Also the other blog posts have different big fishes, so check them out as well.
nitinagg 3 years ago
For static websites, hosting them directly on S3 with cloudfront, or on cloudflare might be a better option?
- flatiron 3 years ago
  How’s the free tier on aws for s3 and cloudfront? I can think of free alternatives that are equally as good if not better.
  - hedora 3 years ago
    S3 + cloudfront + lambda is costing me pennies per month for a trivial site. What are the free alternatives that beat it?
    Requirements:
    - rsync style publishing
    - not supported by tracking users.
    - raw static file hosting (including html)
    - redirect foo.com/bar to foo.com/bar/index.html (this is why I need lambda...)
    - zero admin https certificate management
    - flatiron 3 years ago
      GitHub pages gives you all this except the redirect and replace rclone with…git and is free (although evil Microsoft blah blah)
- MuffinFlavored 3 years ago
  or https://pages.github.com/ maybe?
souenzzo 3 years ago
Is it smaller than darkhttpd?
https://unix4lyfe.org/darkhttpd/
kristianpaul 3 years ago
Why do we need this when you can run a web server inside systemd?
- hedora 3 years ago
  This doesn't hijack a bunch of stuff on the host OS and replace it with garbage versions.
  I want things like DNS, X11 screeb locking, ssh session management, syslog, etc. to just work. I can't figure out how to fix any of that stuff under systemd, and at least one is always broken by default in my experience.
wereHamster 3 years ago
I used this as a base image for a static site, but then needed to return a custom status code, and decided to build a simple static file server with go. It's less than 30 lines, and image size is <5MB. Not as small as thttpd but more flexible.
rhim 3 years ago
althttpd beats this: https://hub.docker.com/r/rouhim/althttpd/tags (~63 KB)
superkuh 3 years ago
Well, this will definitely serve an unchanging static website. But unchanging static websites are just archives. Most static websites have new .html and other files added on whim regularly.
- EnigmaCurry 3 years ago
  You can just mount an external volume on top of /home/static to and be able to change the files that way. But for a single-page-app I think it works great to be able to version the entire site in the docker image tag.
CameronNemo 3 years ago
I do something similar at work for internal only static docs.
The image is a small container with an http daemon. It gets deployed as a statefulset and I mount a volume into the pod to store the static pages (they don't get put into the image). Then I use cert-manager and an Istio ingress gateway to add TLS on top.
Updating the sites (yes, several at the same domain) is done via kubectl cp, which is not the most secure but good enough for our team. I could probably use an rsync or ssh daemon to lock it down further, but I have not tried that.
sgtnoodle 3 years ago
Seems pretty silly. That being said, I did the exact same thing a couple years ago for work. My first attempt was to use busybox's built-in httpd, but it didn't support restarts. I vaguely recall settling on the same alpine + thttpd solution. The files being served were large, so the alpine solution was good enough.
amanzi 3 years ago
I assume the author would then publish this behind a reverse proxy that implements TLS? Seems like an unnecessary dependency, given that Docker is perfect for solving dependency issues.
- EnigmaCurry 3 years ago
  That's certainly what I would do. I think its great that thttpd does not include a TLS dependency itself. Every once in awhile I find a project that forces their own TLS and its annoying to undo it.
mro_name 3 years ago
Docker, really?
Sounds like brain surgery in order to make a jam sandwich to me.
- nilsandrey 3 years ago
  Locally could be easier to rely on background run of a docker image instead of another console serving the files, just to run and forget, just use it by the dependent project you probably could be working on (Dependent on the static content). I'm agreed on the cloud it's better use the plethora of services available for static content directly like Cloudflare.
  - mro_name 3 years ago
    > plethora of services available for static content
    when I think of static content, I think of buying a domain name + shared hosting for monthly EUR 2,-.
    And not assigning rights nor control but having a legal claim on both service and name. Am I missing something?
- Casteil 3 years ago
  It's a good way to compartmentalize if you've got a lot going on on a single machine.
  - mro_name 3 years ago
    > compartmentalize
    a static website, srsly?
    - Casteil 3 years ago
      Uh, yeah? Could host dozens (or even hundreds) of different sites/domains with different degrees of functionality in different languages/frameworks for different clients on one machine.
krnlpnc 3 years ago
Up next: how to serve a LAMP site from a single docker image
- hedora 3 years ago
  It's pretty easy. I put the data in a bind mount on btrfs on my synology NAS. It snapshots the FS and does an incremental backup with hyper backup each night. The backup is crash coherent, zero downtime, and the RDBMS doesn't need to know about it.
  This is really useful for tiny little services that each want a different database server.
patrakov 3 years ago
That's a good educational resource to show to people who need to learn about multi-stage Docker builds.
pojzon 3 years ago
Tbh the moment the author thought about hosting yourself anything to serve static pages -> it was already too much effort.
There are free ways to host static pages and extremely inexpensive ways to host static pages that are visited mullions of times per month using simply services built for that.
- 3 years ago
- krick 3 years ago
  So, the best free or extremely inexpensive way to host static pages that are visited a lot would be...?
  - riffic 3 years ago
    netlify, amplify, cloudflare pages, vercel, et cetera
    It's a crowded field now
cutler 3 years ago
Is nothing sacred? The KuberDocker juggernaut leaves no stone unturned. Laughable given that Docker was originally designed for managing massive fleets of servers at FAANG-scale.
riffic 3 years ago
there are services specifically for static site hosting. I'd let them do the gritty devops work personally.
Netlify, Amplify, Cloudflare Pages, etc.
- nilsandrey 3 years ago
  I use them too. Sometimes I like to have some repos with the static content, which get deployed by a CD tool to those services. It's common for me when debugging or testing locally in my PC or LAN, to include some docker build for those repos which I don't use at production time, but I used it locally. Maybe is not a big problem at all, but I use it that way, specially when in my projects the CND used is not a free one. Makes sense?
  - riffic 3 years ago
    just working off the headline. visiting your link does a great job explaining the use case you have. I'll revisit tonight for a closer look.
uoaei 3 years ago
Nail, meet hammer.
calltrak 3 years ago
timcavel 3 years ago