Practical Reed-Solomon for Programmers
229 points by heydenberk 4 years ago | 43 comments- h2odragon 4 years agoAmusing: the linked library page, https://www.ka9q.net/code/fec/ says (from 2007) "The new Intel Macs are not yet supported. Stay tuned."
Ignore an issue for long enough, and it stops being an issue :)
- coder543 4 years agoIf anyone wants to read more about Reed Solomon, this is the article that helped me understand Reed Solomon earlier this year: https://innovation.vivint.com/introduction-to-reed-solomon-b...
- ahubert 4 years ago(author here) Nice! Added that to the page, very worthwhile. Thanks.
- vmilner 4 years agoThis BBC engineering paper is also useful.
https://downloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP031...
- vmilner 4 years ago
- ahubert 4 years ago
- boredprograming 4 years agoJust to add to this topic, usually Reed Solomon codes are interleaved with Turbo codes or LDPC.
The reason is this: Reed Solomon corrects entire symbols, but can only correct so many symbols per block. LDPC and Turbo correct single bit errors. If you have a lot of single bit errors, mixed with completely corrupt blocks, it's best to use both.
Interleaving usually mixes the bits around too, to spread long chains of errors out between symbols.
This setup is used in digital radio, 3G, CD and DVD, DSL, etc
- conductor 4 years agoThanks for the article and code.
Also check out https://github.com/Parchive/par2cmdline
I wish something like this was integrated into a modern and free archive format. RAR has it. RAR also got volume recovery: when creating split multi-volume archives you can also create recovery volumes - each recovery volume can replace any other one missing volume. It was a life-saver in floppy-disk times.
- dale_glass 4 years agoWe're not using floppies anymore, and media is extremely reliable.
On the internet you can just retransmit if something went wrong, and if you're worried about the media specifically then there are media specific measures, like RAID and filesystem level checksums.
- macintux 4 years agoCosmic rays are a thing, and wasn’t there just a study linked here a few days ago about how unreliable hardware is at scale?
- dale_glass 4 years agoOh, I'm all for ECC and checksumming and whatnot.
All I'm saying is that adding recovery info to archives is probably not ideal -- we can do it in a task or media specific way that works best for whatever context the data is in.
Redundancy in .RAR made perfect sense back when you were transporting data on floppies, and if the floppy was bad, it was a huge pain. These days we transport over the network and don't need to carry any redundant data at all. All we need is a checksum and to retransmit whatever didn't arrive correctly.
- dale_glass 4 years ago
- macintux 4 years ago
- dale_glass 4 years ago
- ithkuil 4 years agoThere are other erasure codes that go beyond RS. For example LRC (Local Reconstruct Codes) is very interesting; it improves on reed Solomon by requiring less bandwidth/IO for repair reads.
https://www.usenix.org/conference/atc12/technical-sessions/p...
- xrd 4 years agoI loved this article and going down the rabbit hole with the associated links was excited to see a discussion from Russ Cox (golang) about how QR codes rely on RS (and another great article)
- crazygringo 4 years agoAre there any uses cases for Reed-Solomon at the application level?
My impression was always that error checking was implemented at the hardware level, or else at the OS/driver level.
But just curious if there are some applications I'm missing.
- gopalv 4 years ago> uses cases for Reed-Solomon at the application level
Hadoop has an RS implementation inside the filesystem (called "erasure coding"), instead of storing 3 copies of the same data, it can actually instead store ~1.5 copies as (6+3) or (10+4).
Previously, I've run into this tech in satellite internet gateways, but distributed filesystems is where I've gone through the math & probabilities of failure properly.
I work on perf & the extra network hops (with 3 replicas, you read 100% of data local, when you stripe it that doesn't work) and math for the error correction are hot spots when you are trying to keep all cores busy.
- elisbce 4 years agoLinux RAID-6: RS(10,8) Google File System II (Colossus): RS(9,6) Quantcast File System: RS(9,6) Intel & Cloudera’ HDFS-EC: RS(9,6) Yahoo Cloud Object Store: RS(11,8) Backblaze’s online backup: RS(20,17) Facebook’s f4 BLOB storage system: RS(14,10) Baidu’s Atlas Cloud Storage: RS(12, 8) .....
- oofabz 4 years agoAre the numbers the size in bits? If so, those are very small blocks. It seems like you would get more effective redundancy with larger blocks. Does Reed-Solomon scale so poorly that only tiny blocks are computationally feasible?
- sliken 4 years agoGenerally blocks. So backblaze takes 17 blocks of input, and generates 20 blocks out output. Then recoveries work if 17 or more blocks are recovered.
- sliken 4 years ago
- oofabz 4 years ago
- the_benno 4 years agoI worked on a research project in undergrad that used Reed-Solomon -- we showed a series of images to users, who would later pick those images out of a set of un-trained images in order to authenticate or access some privileged data, sans encryption. the error-correcting allowed for some degree of misremembering or forgetfulness, as is normal in human image recognition tasks.
The professor later spun up a startup that has since been acquired by Dropbox, but I'm unaware what kind of product they're currently working on.
- 4 years ago
- mook 4 years agoParchive 2 was, I believe, an implementation of Reed-Solomon at the application level; mostly used (when it was introduced) to counter loss from missing chunks on binary newsgroups. I think RAR might have picked up a similar feature around that time too?
I believe parchive version 1 was simple XOR parity.
- akalin 4 years agoNo, parchive v1 also used Reed-Solomon. The main difference between v1 and v2 was that v1 worked on the file level, but v2 divided the set of files into blocks.
Also, the parity matrix for v1 would be sometimes singular (non-invertible), which v2 tried to fix but it didn't quite work.
- akalin 4 years ago
- curtisf 4 years agoYou may want to be able to run highly resilient databases even on non-error-detecting hardware. (Most consumer disks, and even a lot of enterprise disks, don't reliably detect/report disk corruption)
You might also be building a distributed system, where blocks are spread across multiple disks. In that case, an "error" you're trying to correct might be the loss of a disk or server in the distributed system; there is no one single disk or OS responsible for all the relevant blocks in that case.
But, both of these are assuming you're building some kind of data-store, which is not a typical user application. Writing robust data-stores is hard for many reasons, error correcting is just one of them.
- alphabet9000 4 years agoi used a reed solomon library recently in a project that encodes and decodes binary data stored in a video channel. i added it to compensate for compression/noise introduced in the re-encoding process on third party sites like youtube. here's an example of what it looks like: [0]
- cycomanic 4 years agoQR codes use Reed Solomon codes
- jtlienwis 4 years agoSo do pdf-417 barcodes. I remember writing the code for pdf417_decode and pdf417_encode (freeware) back in the day. The Blahut book on error correcting codes was a great help, but all the examples where using binary galois fields. The pdf-417 used a field of powers of 3 mode 929, so my brain had to figure out how that worked and put it into code.
- jtlienwis 4 years ago
- kevin_thibedeau 4 years agoIf you need to transmit data over an unreliable link where retransmission isn't practical you will want some sort of FEC. If hardware isn't available to do the job then it gets done in software.
- sparc24 4 years ago
- gopalv 4 years ago
- sparc24 4 years agoThese 2 papers helped me better understand some more practical aspects of it.
https://mirrors.edge.kernel.org/pub/linux/kernel/people/hpa/...https://www.usenix.org/legacy/event/fast08/tech/full_papers/...
- moreati 4 years agoA semi-related question/puzzle. Is it possible to reverse engineer parameters of a Reed Solomoon function?
Given one or more known inputs/outputs of a Reed Solomon function with known symbol size, message length, number of parity symbols. Is it possible to calculate the other parameters (polynomial coefficients, primitive element etc)?
I failed to find an answer a few years ago, when trying to interoperate with chirp.io (now gone). https://math.stackexchange.com/questions/663643/discover-par...
- xorvoid 4 years agoDoesn’t seem too hard actually, given enough example input. The parity is just a linear combination of the input so you can set up a linear system and solve for those things encoding matrix rows pretty easily. From there you may need to make some assumptions about what type of matrix was used (e.g. vandermonde is popular) and perhaps the field size being used (e.g. GF(2^8), and with those, it should be easy to determine the reducing polynomial. Some of these choices are small enough to be able to brute force (finite fields used for practical problems tend to be small, naturally). If you have a dataset, I might actually toy with it myself.. sounds fun.
- moreati 4 years agoThanks. I have 30 examples I got from the API, many moons ago https://github.com/moreati/chirppy/blob/master/examples.json. My finite field knowledge is not up to solving it analytically, my attempts to brute force it are in https://github.com/moreati/chirppy/blob/master/trials.py. That includes functions to go between the characters, and integer representations.
- moreati 4 years ago
- xorvoid 4 years ago
- nayuki 4 years agoMathematical derivation and runnable code for a Reed-Solomon error-correcting code decoder: https://www.nayuki.io/page/reed-solomon-error-correcting-cod...
- Tcepsa 4 years agoAt the risk of oversimplifying, is there any value in the analogy, "It's like RAID but for data streams"?
- frabert 4 years agoProblem with that is, as the article states, RAID6 _can_ use RS as its error correction method, which turns this description into something a bit too circular for my taste
- frabert 4 years ago
- IAmLiterallyAB 4 years agoWhat is the difference between RS and Hamming codes?
- rurban 4 years agoHamming Codes can only recover from 2 wrong bits per byte, with RS it's configurable.
- rurban 4 years ago
- alessandroetc 4 years agowhat are your thoughts on storj.io? they utilize R-S for all their entire storage platform.
- mrfusion 4 years agoIs this what tcp/ip uses?
- rl1987 4 years agoNo. TCP, IP and UDP are all using Internet Checksum (RFC 1071). Ethernet uses CRC.
- znpy 4 years agoI went and skimmed the rfc you're citing and it's fro 1988, it's old enough to have samples in C code where the "register" keyword is still used (afaik nowadays most compilers will ignore it) along with assembly code for motorola 68020, Cray CPUs and the IBM-370.
EDIT: for reference: https://datatracker.ietf.org/doc/html/rfc1071.html
- brandmeyer 4 years agoAll of these are techniques for verifying a message. Galileo also uses a 24-bit CRC to verify its messages after performing forward error correction.
- znpy 4 years ago
- jandrese 4 years agoTCP doesn't do forward error correction at all. It uses a retransmission scheme instead. TCP's error correction is not only to insure a consistent datastream, but also to more equitably share limited bandwidth.
- stagger87 4 years agoForward error correction is usually at the PHY layer, not the transport layer. So the more appropriate question is, does 10GbE/100GbE/WLAN/etc use RS coding? (and yes some of them do.)
- TonyTrapp 4 years agoNo, but Reed-Solomon codes are used on on CDs.
- sliken 4 years agoIndeed, I think there's a 3 layers of encodings to protect against different corruption scenarios. Trick is some equipment only does a partial implementation, thus the need for a tool like cdparanoia to recovery data from corrupted CDs.
- sliken 4 years ago
- rl1987 4 years ago