Strtod Is Wild

62 points by turtledragonfly 9 months ago | 14 comments
  • lifthrasiir 9 months ago
    To be clear, David Gay's dtoa is not a modern implementation of decimal-to-float conversion. There had been several much simpler and performant alternatives, including:

    - Google's double-conversion [1], which is best known for introducing the Grisu family of new float-to-decimal algorithms but also has a much less documented float-to-decimal algorithm via successive approximations AFAIK.

    - The Eisel-Lemire algorithm [2], which is a Grisu3-like algorithm and returns either correct digits or a much rare fallback signal and currently in the standard libraries of Go and Rust.

    - I believe Microsoft's own C Runtime (msvcrt, later ucrt) also has a completely separate code which algorithm is roughly similar to one of above.

    These implementations also clearly demonstrate that such conversion only needs a bigint support of the bounded size (~3 KB) and can be done in much smaller code than dtoa.

    [1] https://github.com/google/double-conversion

    [1] https://lemire.me/blog/2020/03/10/fast-float-parsing-in-prac...

    • jezek2 9 months ago
      Some time ago I've used a simple algorithm by using 128bit floats for doubles (and 64bit for floats) which seems to work very nicely and is straightforward to implement. It passed the full tests for conversion for 32bit floats and the sparse tests for doubles.

      I did a blog post about it: https://www.fixscript.org/blog/math-library (includes interactive demos)

      Any opinions?

      • turtledragonfly 9 months ago
        Author here; thanks for this info! I'll add some references to this stuff in the article.
      • Neywiny 9 months ago
        I think recently I had to make my own because strtod was too large for my embedded system. I just wanted simple xx.yy type strings, so it was as easy as parsing 2 integers, convert, divide, add, done. I often wish these functions had more variants for stuff like that.
        • nly 9 months ago
          Parsing decimal strings to double in that way will produce the wrong bit value in many cases.
          • Neywiny 9 months ago
            I just needed simple values. It worked fine for me. I should've fuzzed it to check, but really I think all I needed was 1-2 decimal places.
            • yjftsjthsd-h 9 months ago
              If it's only 2 decimals, surely you could trivially test every single possible input? I think of fuzzing as random, but that's a specific optimization that you might not need.
        • fph 9 months ago
          I hate the anti-responsive design of this page: I zoom in (ctrl-plus in Firefox), and the text gets smaller.
          • turtledragonfly 9 months ago
            Author here — I apologize for that; it's on my short list of things to fix.

            I actually hadn't fully understood that the "Ctrl+" zoom in desktop browsers shrinks the viewport rather than doing an actual zoom (unlike pinch-to-zoom on a touch device) until I had put the media queries in place — whoops.

            ----

            As an aside, you or another might be interested: you can configure Firefox to do a "pinch-zoom" style zoom when you do "Ctrl+scrollwheel" on the mouse:

            Short version: set "mousewheel.with_control.action = 5" in about:config.

            Longer discussion: https://www.tenforums.com/browsers-email/204202-firefox-enab...

            That way, you can always zoom with confidence that a site won't rejigger its layout as a result.

            (but obviously I'll update my rules to make the default behavior better, too)

          • lifthrasiir 9 months ago
            I wondered why too, and it seems that it has way too many media queries to distinguish screen width in 5cm increments (which equals to ~189px in CSS). Don't think that many queries are needed nowadays.
          • jancsika 9 months ago
            Would be cool to see a histogram of all of the web's textual representation of floats based on character length.

            In other words-- what percentage of outstanding publicly-accessible data sets require an implementation of strod which can allocate memory on the heap?

            Even this article, which talks about millions of digits, could be parsed just fine with a strod that's limited to 64 characters.