Silkenweb Example: Hackernews Clone

Redis re-implemented in Rust

400 points by wmwragg 10 years ago | 106 comments

undefined0 10 years ago
For me, Redis was the first software written in C which I could easily customize with additional features (as I have low C knowledge). It was written beautifully. I've been learning Rust, I certainly find learning Rust easier than C and I like the fact that I can dive into software written in Rust without worrying about GC. You've done the best of both worlds for me by writing Redis in Rust. With that said, I'm still having an easier time reading Redis in C over your code as you are lacking comments and well named function/variable names. I admire your work nonetheless.
- seppo0010 10 years ago
  That's a fair criticism, and well taken, but keep in mind I'm learning the language as I go, and most of the commits are just rewriting things because they were suboptimal, not idiomatic, or hard to read. At this stage, I would consider adding comments wasteful.
  I also have no intention of making this project live as long or have as many users as Redis does.
  - 10 years ago
  - illumen 10 years ago
    Learning to write readable code is a good thing I think worth the effort.
    Can rust be readable?
    - seppo0010 10 years ago
      If you want good examples of Rust code, you should probably look into other repositories. Today I was checking out how Rust's HashMap works and I found it quite readable:
      https://github.com/rust-lang/rust/blob/master/src/libstd/col...
      You can also look at `mio`, the asynchronous IO library that's popular in Rust:
      https://github.com/carllerche/mio/blob/master/src/notify.rs#...
thedufer 10 years ago
Before anyone starts using this as a Redis replacement on Windows as the readme suggests, take a look at the TODO file. Notable missing features include:
- maxmemory key eviction
- hash values
- ~2/3 of the set operators
- multi/exec
- lua scripting
This is an interesting and potentially useful effort, but a replacement for Redis it is not.
- shankun 10 years ago
  By the way, if you are looking for a production-quality Windows port of Redis, there is a fork available at https://github.com/MSOpenTech/redis. We (Microsoft) provide it in production as Azure's cache service today, and are committed to continuing to work on it.
  - kawsper 10 years ago
    Although it sparked some debate when Microsoft ported Redis to Win32 with libuv (http://oldblog.antirez.com/post/redis-win32-msft-patch.html) I am impressed by their commitment that the fork is still going 4 years later.
  - ddlutz 10 years ago
    Are you part of the Azure cache service? I was an intern the the Edge Caching and Storage team and joining back fulltime next month. If things are the same way they were it would be worth it exploring using Redis for our cache and I'd like to talk details.
    - shankun 10 years ago
      I am not, but I work with them closely. If you'd like to talk to the team involved, send me (shankun_at_microsoft_com) your email and I'll connect you up!
  - netcraft 10 years ago
    I want to thank you for that work and look forward to 3.0 there.
- seppo0010 10 years ago
  Also notice that the main developer only has two months of experience with Rust, so it is probably not as stable and well tested as Redis. As stated in the README, the main goal is to learn Rust.
- Zancarius 10 years ago
  The README does seem a bit ambitious, but it notes that the purpose was to learn Rust.
  I'm sure pull requests to bring it up to feature parity would be welcome!
kibwen 10 years ago
Since this seems to be just a learning project, note as well that there exist Rust bindings to Redis itself, from Armin Ronacher (though I'm not sure if they've yet been updated to work on 1.0): https://github.com/mitsuhiko/redis-rs
- the_mitsuhiko 10 years ago
  Yep, works with 1.0.
wyaeld 10 years ago
The readme says its a learning project.
Its a very interesting piece of work though.
I'll be interested to see Antirez's view on the trade-offs between C and Rust for this.
- seppo0010 10 years ago
  He is at least curious. https://twitter.com/antirez/status/611189939519229952
unfamiliar 10 years ago
Could somebody give me a tl;dr on Redis? I keep hearing about it but from the summary I can't tell what kind of applications it is being used for.
- twic 10 years ago
  The phrase i like most is "data structure server". It's basically a giant heap that you can fill with data structures - strings, lists, sets, sorted sets, maps, bitsets, and this wacky HyperLogLog thing:
  http://redis.io/topics/data-types-intro
  The data structures are all addressed by string keys.
  Redis can persist this heap to disk, and load it again, so you get a measure of durability, but the typical use case is for data you can afford to lose - caches, metrics, some kinds of events, etc.
  Redis's key non-functional strengths are speed and robustness. Operations people love it because you stick it in production and it just quietly keeps on working without needing attention or setting your CPU on fire.
  To my mind, any project should have PostgreSQL as its first data store. But it should probably have Redis as its second, when it finally has some data that needs to be changed or accessed so fast that PostgreSQL can't keep up.
  (Kafka is third)
- aaggarwal 10 years ago
  From their official github page, Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, HyperLogLogs, Bitmaps.
  It simply means that the key-value store is directly loaded into the memory (RAM) and is available for fast access, but the data is retained (persistent) even after the application is closed.
  It is usually used as cache store, queuing messages to communicate with different processes locally or distributed.
  - rakoo 10 years ago
    To add to that, Redis is a TCP server so you can speak to it from multiple processes on multiple machines (and Redis will easily support huge loads)
    Its data structure cover a good part of what you'd need with generic data structures, which makes Redis an easy way to do the logic of, say, List intersection of friends common between multiple people, sorted set of goods ranked by their amount, all of this shared with other processes.
    Redis also offers pubsub capabilities in two forms:
    - A standard PUB/SUB couple which does what you think it does
    - Blocking pop on a list for a client, and a push for another client, which will "wake up" the first one with the value.
    It's a very versatile swiss knife.
- kaitnieks 10 years ago
  It's a memory cache to take some load off your DB. The good thing about Redis is the useful data structures it supports: lists, sets, hashes, bitmaps and somewhat more specific sorted sets.
- iagooar 10 years ago
  Put key & read key with straight-forward abstractions. Simple and beautiful.
  It can be used for caching, queues and for applications with volatile data.
shmerl 10 years ago
I was just thinking, that Rust is a great candidate for big data processing tools. So much more than Java (which is annoyingly used a lot there). Something like Spark and HDFS should be implemented in Rust.
- frankmcsherry 10 years ago
  You can contribute to:
  https://github.com/frankmcsherry/timely-dataflow
  https://github.com/frankmcsherry/differential-dataflow
  Or, just tell your friends. :)
  Better yet, write some python / pandas / dataframes / whatever_the_cool_kids_need layer on top, and rule the next big data drama cycle.
  - shmerl 10 years ago
    Thanks, those look very interesting! I'll go read about Naiad first :)
- pron 10 years ago
  The more cores you have and the more RAM, the bigger advantage GC has. The thing with having lots of RAM is that it's very hard to take advantage of it with on-stack data (which can, at most, use about 1-2% of the total RAM available -- do the math) and thread-local heaps. Once you use thread-local heaps/arenas, you need to shard your data. Any cross-shard access would mean locking, which doesn't scale very well. That's exactly where GCs shine: they let you have scalable, concurrent access to data with arbitrary lifespan. That's why Java is used for those kind of applications -- it performs and scales much better than Rust can hope to do on large machines.
  You are right, though, that if the processing is extremely "batchy" and all data dies at the same time, then it doesn't make a difference.
  - shmerl 10 years ago
    > That's why Java is used for those kind of applications
    I'm not convinced that's the reason why Java is used for it. There are native alternatives like HPCC which claim to perform better.
    As was noted, concurrent access to shared data is not something very common in such distributed computation scenario. Well designed processing will avoid it, and thus will avoid need for locking as well.
    - pron 10 years ago
      > There are native alternatives like HPCC which claim to perform better.
      The goal with performance is almost never to get the maximum possible performance but the best peformance/effort ratio for your needs. This is true even for performance-critical applications. As there are diminishing returns, every increase in performance costs more than the previous. Very, very few applications are willing to invest an extra 30-50% effort to go from 98% (of maximum performance) to 100%.
      As to concurrent access -- see my other comment (depends on use-case).
  - 10 years ago
  - Meai 10 years ago
    Rust has heap memory as well..?
    - pron 10 years ago
      Of course, but concurrent access to shared data in Rust (let alone when that data has an arbitrary lifetime) carries more overhead (and is much trickier) than in GCed environments. As a general rule, GCs decrease overhead for concurrent memory access.
- maxdemarzi 10 years ago
  Little secret... with Rust you can just do big data processing on a single core => http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...
  - shmerl 10 years ago
    In many cases distributed computation is needed when there is BIG data, which won't fit on a laptop like in that example. I.e. distribution is needed not only because computation can't be done on one node (i.e. it would take too long), but because data can't possibly fit on any one node.
    - maxdemarzi 10 years ago
      See his next post: http://www.frankmcsherry.org/graph/scalability/cost/2015/02/...
      How big are the jobs being done? I remember reading a statistic[1] that is a couple of years old that 50% of the big data jobs were less than 10 GB. 90% was less than 100 GB and only something like 1% was bigger than a TB. You can fit multiple terabytes on a single machine. I wonder if anyone has new statistics about this.
      [1]: http://www.msr-waypoint.com/pubs/204499/a20-appuswamy.pdf
    - 10 years ago
  - sampo 10 years ago
    Many parallel algorithms become slower if you make them use too many cores, as they spend most of the time in communication, and very little in actual computations. Maybe the parallel systems would have been faster on a smaller number of cores than 128?
  - nickpsecurity 10 years ago
    That's a pretty sad result to get from 128 cores. I've seen amateur Beowulf clusters get better results.
    - tacone 10 years ago
      Also GNU utils http://aadrake.com/command-line-tools-can-be-235x-faster-tha... The above experiment (which has an interesting github repo) is somewhat over (and real world unusable), but still is eye opening. Hadoop and Spark bring so much complexity that looking for simpler solutions is something worth considering.
r0naa 10 years ago
Impressive!
Could someone (or OP) elaborate on the value that re-implementing a whole software to a new language provide comparatively to just building an interface "bridging" both worlds?
To clarify, my metric for "value" is usefulness to other people. That is, without considering the (interesting) learning opportunity that it represent for the author.
For example, someone developed a Python interface to the Stanford Core-NLP library (written in Java). Would re-writing the Core NLP library to Python be useful to the community? How to figure what are people needs?
I am asking because while I think it would be ton of fun and allow me to learn a lot, I also value building useful software and re-writing a whole system sounds like an overkill except for a few very niche cases..
And if I am not mistaken you would need a team at least as large as the parent project to implement new features, fix bugs and keep pace with it. Looking forward to hear what HNers think!
edit: clarified ambiguities
- nickpsecurity 10 years ago
  Aside from project's reason, there's one very good reason to re-implement a bunch of stuff in Rust: testing whether it delivers. It's being pushed as a safer, better systems language. So, let's take many different C/C++ apps, rewrite them in Rust, and see what the results are across many metrics. By the end of it, we should know Rust's strengths and weaknesses in real-world coding.
- themckman 10 years ago
  The README answers this:
```
  To learn Rust.
```
  Edit: It also mentions not being tied to UNIX and appears to claim it will run on Windows. That's certainly something.
  - r0naa 10 years ago
    Sorry if I wasn't clear, but I am looking for a more general answer! I would like to know in which case it is useful (to other people) and discuss it's value comparatively to writing interfaces to other languages.
    - seppo0010 10 years ago
      I had no intention of making the end result useful, but I run into interesting problems.
      First, I wanted to make it as pure Rust as possible. I tried to avoid UNIX specific code, and since there is no library with Windows support for asynchronous IO in Rust, I was pushed into spawning thread and blocking waiting for client data. I quickly noticed that the benchmark was way below Redis (around 60% of ops/sec with 50 clients). But then someone point out to me[1] that I was running tests in a machine with two cores, and this actually may be better for machines with multiple cores[2]. I have yet to try it out and benchmark the results.
      So far Rust API was disappointing for network operations. For example, `TcpStream.read()`[3] and `TcpListener.incoming()`[4] do not have a timeout. Maybe because its development is driven for Servo and not for servers.
      I have thought about doubling down on multithreading and instead of a global database lock as rsedis is using now, having one per key (or some other arbitrary partition), and having concurrent operations, which is hard to do safely in C. But I have not gotten there yet.
      [1] https://github.com/jonhoo/rucache/issues/2
      [2] https://github.com/jonhoo/volley/
      [3] http://doc.rust-lang.org/1.0.0-beta/std/net/struct.TcpStream...
      [4] http://doc.rust-lang.org/1.0.0-beta/std/net/struct.TcpListen...
- coldtea 10 years ago
  >Could someone (or OP) elaborate on the value that re-implementing a whole software to a new language provide comparatively to just building an interface "bridging" both worlds?
  Making it safer and even catching bugs in the original implementation (both things Rust will help with)?
  Making it integrate seamlesly with the new language's ecosystem? E.g. Lucene is Java, and someone could use that, but there are tons of ports of it in C, C++, Python etc, providing convenience to integrate it with projects in these languages.
  >And if I am not mistaken you would need a team at least as large as the parent project to implement new features, fix bugs and keep pace with it.
  Not necessarily. A project with 10 part time contributors could be matched with a project with 2-3 full time competent hackers for example, or even surpassed.
  - frik 10 years ago
    > Lucene is Java ... there are tons of ports of it in C, C++, Python, etc.
    There used to be several ports, though most are dead and/or are several major versions behind. A new C++ or Rust port would be great, though unrealistic given the huge project side.
    - coldtea 10 years ago
      >though unrealistic given the huge project side.
      It could be done with some automated translation, and cleared out from there.
- deet 10 years ago
  This project specifically is probably not useful for others just because it is written in another language (unless it were to succeed in fixing problems or improving security or performance.) Redis is a server application, not a library, so there's a clearly defined and interoperable protocol to bridge the Redis server to other languages already. Writing a Redis client for the other language would be more useful.
  More generally, as coldtea mentions, making integration into the rest of the language's ecosystem is the primary benefit of rewriting in another language.
  The value of such a port to others depends on how easy it is to integrate between the two languages, either via libraries or other methods. The harder it is to integrate the two (and the absence of automated translation tools) increases the value of the rewrite to others.
  Your Core-NLP example is actually an interesting one, because that library has already been ported to other languages... It is available for the C#/F# ecosystem (http://sergey-tihon.github.io/Stanford.NLP.NET/).
- kbenson 10 years ago
  In this case, it's a learning exercise. Sometimes, it's because there are other benefits, such as making it easier to distribute or easier to tie into specific parts of the algorithm than may be possible by calling a library that provides a high level interface.
- unoti 10 years ago
  From a learning perspective, reimplementation has key advantage versus other kinds of projects: the design is completely done, so you can focus exclusively on the mechanics of implementation.
resca79 10 years ago
I like this kind of project. But the use case of redis it's a little bit exstream, I mean that the main feature of redis is the speed and the way how the memory consuption is handled. If this requirements are not satisfied, it is only a very good way to learn Rust( as the author goal) and the redis internal.
GeertVL 10 years ago
So how do you re-implement something like Redis in another language? Is it more of a translation job or do you start with splitting the concepts and try to implement it. Or take the idea and go your own way with implementing it?
sudhirj 10 years ago
I'm try the same thing for similar reasons in Go, but I'm wondering if at some point a Go version would perform better than C. On a machine with a large number of cores, perhaps?
GitHub.com/sudhirj/restis
Also wondering if some rethinking is possible - would a HTTP interface a la DynamoDB be more useful? Can complexity and performance be increased by using a purely memory backend with no disk persistence? If there were pluggable back ends would a Postgres or DynamoDB back end be more useful for terabytes / petabytes of data? Is the beauty of Redis the API or the implementation?
- endymi0n 10 years ago
  > but I'm wondering if at some point a Go version would perform better than C.
  The answer is "no" with a certain amount of probability. Redis isn't single threaded by lack of capability, but by design. Concurrency for multiple CPUs will actually slow down a lot of the stuff you see, as you will need to introduce locking mechanisms.
  Also, garbage collection is highly tuned and customized in Redis to the use case of an in-memory-DB (in stark contrast to usual allocation patterns of an application), up to the point where it's almost impossible to replicate the performance in a garbage collected language.
  I love Go and we're a 100% Go (and Angular) shop, but for an in-memory DB it wouldn't be a sane choice.
- pzduniak 10 years ago
  https://github.com/siddontang/ledisdb
- vidarh 10 years ago
  You can turn off disk persistence in Redis. My main usage of Redis is with disk persistence turned off (it handles a few hours worth of samples of data that we don't care if we lose - we care about having long term averages only of the data in question).
  There should be minimal overhead from having the capability in Redis due to the way it implements disk snapshots (RDB snapshots are done by fork()'ing and relying on copy on write to let the main process keep on doing its thing while the child process writes the snapshot, so the main process doesn't need to care; other than that Redis offers logging/journalling of changes, but the cost of having that as an option is trivial if it's switched off).
  Having pluggable backends for things like Postgres or DynamoDB seems a bit at odds with the purpose of Redis, which is exactly that you pay the (very low) cost of in-memory manipulation of simple data structures, though if a single Redis server could partition the key space between plugins, it might potentially be useful by letting you e.g. move keys between backends and still access them transparently to the client. E.g. for the samples I mentioned above, we roll up and transfer data to a CouchDB instance for archival now (doesn't matter that it's CouchDB really - we just need a persistent key-value store; Postgres or DynamoDB would also both have worked fine), but if I could archive them while still having them visible in a Redis-like server together with the in-memory keys, that'd make the client a tiny bit simpler.
  For most Redis usage, I think paying the cost of connection setup and teardown and sending HTTP headers etc. would likely slow things down immensely. At least it would for my usage. Having a HTTP interface as an addition might be useful in some scenarios to enable new use cases, but as a replacement for the current Redis API would be a disaster.
  If you want to explore alternative interfaces, I'd instead suggest going in the opposite direction, and experimenting with a UDP interface. In a typical data centre setting packet loss is low enough that while you'd need retry logic, it wouldn't necessarily get exercised much in normal situations.
  (On the other hand, for the typical request/reply cycle it might very well not give any benefits vs. tcp in most scenarios where multiple request/replies are done over a single connection and thus amortising the connection setup cost - would be interesting to benchmark, though)
- alexchamberlain 10 years ago
  Ah, HTTP is your hammer
clu3 10 years ago
Man you should have named it Rudis
beyondcompute 10 years ago
Spectacular! Could you add synchronous replication though? And coalescing queries (so that entire system processes queries in batches, say 300 times per second)?
vicpara 10 years ago
Why would someone do that? To what end? Why isn't anyone re-writing Redis in assembler to have it kick ass like pros? Can you write Windows in rust?
- derefr 10 years ago
  What I'm personally really surprised about is that nobody's rewriting Redis as a unikernel to clear away all the OS context-switching/networking overhead from its basic operations.
  - Jweb_Guru 10 years ago
    Redis is already leaving plenty of performance on the table, e.g. by not having any concurrent shared memory data structures (the fastest concurrent hash tables achieve better throughput even on inserts than the fastest single-threaded ones). It does this in the name of implementation simplicity. People focused on implementation simplicity don't generally abandon the operating system.
    - derefr 10 years ago
      People run Redis mostly in Linux-in-a-VM (with Redis being the only "user-serving" process) already, though, no? I would think Redis-as-the-entire-VM would be less to think about, operation-wise, at least if your cloud or data-center templates its VMs with something like EC2 AMIs. You would just launch a "Redis appliance" AMI and move on.
      It's a feeling less of maintaining boxes, and more equivalent to paying a Redis-as-a-Service provider.
- itamarhaber 10 years ago
  The "Why" is @seppo010's to answer (but having it run as is on all OSs is a big plus for one). As for writing it in Assembler, that makes little practical sense since Redis is written in (ANSI) C and it quite well optimized. In fact, if you profile Redis you'll see that very little time is actually spent by the code itself - OS, storage and network are the real bottlenecks usually.
- adamrt 10 years ago
  Its at the very top of the readme. Here is a direct link to your question. https://github.com/seppo0010/rsedis#why
vamitrou 10 years ago
Is it compatible with the .rdb redis dumps?
ahmetmsft 10 years ago
Care to post details about this? Is this actually fast? Does it implement all features and guarantees of redis? Should anybody actually use this in production (maybe because it works on Windows)? Is it well tested?
Looks like a really cool effort but authors of open source projects often think people would read the code and figure out all, the truth is people usually look at what's in the readme and that's all the attention span most people are going to have. My 2c: improve your README.md.
- detaro 10 years ago
  He links a list of missing stuff in the readme.
  And if you read "Why? To learn rust" and ask "should I use this in production"...