Three Pillars of Reproducible Builds

82 points by spatten 3 years ago | 14 comments
  • FartyMcFarter 3 years ago
    One of the most fun non-determinism bugs I have worked on was the result of using an associative container with the key type being a pointer (like a std::map<void*, int> or similar), and then iterating over this container.

    Since the order and value of dynamically allocated pointers is non-deterministic, this resulted in diverging behaviour at some point.

    Better be sure that all your tools used during the build don't do this kind of thing as well.

    • aidenn0 3 years ago
      With ASLR off, the order and value should be identical between runs on the same malloc implementation, as stochastic allocators are not in common use
      • FartyMcFarter 3 years ago
        Not when multi-threading is involved, I would think. That or timing-dependent code making allocations.
  • pabs3 3 years ago
    These three aren't enough, you also need to take care of not storing build timestamps, hostnames, timezones, sorting and more:

    https://reproducible-builds.org/docs/

    • chriswarbo 3 years ago
      Some of that is mentioned, e.g.

      > Build steps that use system time to generate timestamps.

      > Builds that change behavior based on currently set environment variables but don’t commit environment variable configurations.

    • jiehong 3 years ago
      On the JVM, maven doesn’t make this particularly easy.

      It’s possible to try to store dependencies locally instead of shared in a global m2 repository, but it’s difficult to stop maven from adding the current time in jars or wars…

      It’s as if all the default settings are the opposite of what they should be for reproducible builds.

      Any idea if there is a project to try to improve things with maven or with another JVM tool? (Grade, sbt, etc.)

      • mchmarny 3 years ago
        If you have an option to containerize the app, Jib may be what you are looking for. Plugs into Maven, and the same source/content always generates the same image - https://github.com/GoogleContainerTools/jib
      • chriswarbo 3 years ago
        > Any idea if there is a project to try to improve things with maven or with another JVM tool? (Grade, sbt, etc.)

        We've found SBT to be less reproducible than Maven. In particular, its "configuration file" (build.sbt) is actually executable Scala code (and highly imperative too, e.g. appending to mutable dependency lists). I've seen projects which choose different dependencies based on env var settings, string matches, etc.

        I've also seen projects which add pre/post steps to a test suite, for spinning-up and tearing-down a mock database (the dynamodb-local SBT plugin). The crazy part about that, is that SBT only becomes aware of the plugin when it's about to execute the test suite; hence it doesn't appear in any dependency lists, so we can't automatically fetch it ahead-of-time. By the way, that plugin itself works by downloading and running a "latest.zip" file from an AWS URL....

        • robto 3 years ago
          Huawei just published a paper (Towards Build Verifiability for Java-based Systems[0]) on trying to get the JVM ecosystem reproducible. It looks like it's early days, but I'm paying attention.

          [0]https://arxiv.org/abs/2202.05906

          • zzandd 3 years ago
            https://reproducible-builds.org/docs/jvm/ Which links to https://maven.apache.org/guides/mini/guide-reproducible-buil...

            Haven't tried this myself as I don't particularly like maven. It should be possible though

          • cies 3 years ago
            How can you discuss this w/o mentioning Nix (or the likes)?
            • _3u10 3 years ago
              I guess any stubs the compiler adds will also have to be reproducible, big whoop.