Architecture for a JavaScript to C compiler

90 points by timruffles 6 years ago | 45 comments
  • snek 6 years ago
    I don't understand why people keep making js-to-native compilers. the result is always less performance that what modern js engines can do to the original source code, because all those runtime behaviours the author mentions build up really fast. js engines actually run the code and figure out the types and such to create highly optimized machine code.
    • nicoburns 6 years ago
      Yeah, the JS runtimes are really impressive. I'd be interested to see what a compiler could do with TypeScript though. There could perhaps be guidelines of which features you can use in hot code paths in order to get optimised code, which could be enforced by the compiler for sections of the code which you mark...
      • tjelen 6 years ago
        I think that the AssemblyScript project is halfway there. It compiles a stricter subset of Typescript to webassembly format, which can be then translated to CPU-specific machine code.
        • masklinn 6 years ago
          > Yeah, the JS runtimes are really impressive. I'd be interested to see what a compiler could do with TypeScript though.

          You can probably just compare AOT and JIT Java or C# implementations and get something of an upper bound.

          Actual improvements will likely depend on your use case: for short-running programs the overhead of the parsing, JIT machinery & interpretation might dwarf the execution time so even a straightforward compilation full of virtual calls will edge out the pre-JIT interpreter, however for longer-running programs the JIT should be able to devirtualise, inline and optimise the code much better than the AOT compiler will.

          Inlining is the most important optimisation, and it's hard to AOT inline dynamic dispatch.

          • chrisseaton 6 years ago
            I would imagine JS JITs can determine more specific types at runtime than you can express in TypeScript source code.
            • jdc 6 years ago
              Could you ELI5 why you think that?
          • ridiculous_fish 6 years ago
            Modern JS engines rely on tracing JITs to achieve good performance. These optimizations only kick in once code has executed multiple times.

            But a lot of JS code executes only once, such as layout code running at app launch. Heck, lots of JS code executes zero times. This code still imposes a cost (parsing, etc).

            Consider the complaints about app launch time of Electron apps. A static compiler can be more effective at the runs-once or runs-zero cases.

            • pizlonator 6 years ago
              Not tracing. Speculation.

              The difference is that you compile user control flow as-is unless you have overwhelming evidence that you should do otherwise.

              And yes you are right. This is promising for run-once code, but all of the cost will be in dynamic things like the envGet. An interpreter can actually do better here because most environment resolution can be done as part of bytecode generation. So it’s possible that this experiment leads to something that is slower than JSC’s interpreter.

              • ridiculous_fish 6 years ago
                Agreed that envGet() will be brutal, but optimizations can eliminate a lot of that, e.g. by promoting locals to stack variables instead of heap variables.

                It is possible to statically decide which environment contains each upvar. So I don't understand your conclusion that this experiment may be slower than JSC's bytecode interpreter. What information can JSC exploit at this stage that is statically unavailable?

              • lttlrck 6 years ago
                • ridiculous_fish 6 years ago
                  v8 is remarkable but what is described in those links is primarily a JIT with a cache tacked on. A pipeline optimized for static compilation would make different tradeoffs.
                • chrisseaton 6 years ago
                  > Modern JS engines rely on tracing JITs

                  Which modern JS engine still relies on tracing? I thought they’d all moved on from that technique many years ago, but I’m not an expert in JS.

                  • ridiculous_fish 6 years ago
                    Yes, I misused a term of art, thank you for the correction. I meant only to distinguish modern engines that rely on runtime profiling, from the wholly static approach in this post.
                  • maxxxxx 6 years ago
                    Can the compiler really do much considering how dynamic Javascript is?
                    • ridiculous_fish 6 years ago
                      Depends on your definition of "much", but more than you might expect.

                      Pre-parsing is an obvious big win, but there's also classical compiler optimizations: constant propagation, DCE, certain inlining, etc. Local variables can be statically determined to be non-captured, which means they can be stack allocated.

                      One totally bizarre but profoundly useful property of JS is the restrictions on eval. eval has the ability to run new code, but this code is globally scoped unless it's run as the "eval" function itself. [1] Thus it's easy to decide which local variables are immune to eval; this in turn unlocks lots of optimizations. To my knowledge this weirdo behavior is not present in Python or other dynamic languages.

                      1: https://es5.github.io/#x15.1.2.1.1

                  • timruffles 6 years ago
                    Oh sure, it's just for fun/learning! Check my first post.
                    • a13n 6 years ago
                      Maybe you want to port some existing JS to another platform (eg. OS X, iOS, Android) where you want to run it natively rather than spinning up a JS engine and dealing with the bridge.
                      • roryrjb 6 years ago
                        It may be slower, but I'd give up speed if this would result in less memory usage.
                        • pizlonator 6 years ago
                          On the one hand, you’re totally right.

                          But maybe there will be some breakthrough, so it’s important to stay open-minded. That breakthrough may be something modest like if this style of JS execution was better for some niche use case.

                          • jacobush 6 years ago
                            I could imagine a use case: you want to include some Javascript code base into your own EXE, but you don't want to drag in node.js or something heavy like that.
                          • truth_seeker 6 years ago
                            I wonder how much optimization will it bring to compared to existing JS runtimes such as V8.

                            Thanks to competing world for web browsers, JS runtimes not only efficiently parse to optimized native code but also provide really good JIT compilation benefits.

                            Speculative optimization for V8 - https://ponyfoo.com/articles/an-introduction-to-speculative-...

                            Parallel and Concurrent GC - https://v8.dev/blog/trash-talk

                            Good summary on 10 years of V8 - https://v8.dev/blog/10-years

                            • ridiculous_fish 6 years ago
                              v8-style engines do not parse to optimized native code.

                              As described in the links, v8 parses to an AST, which then is compiled to bytecode. A bytecode VM then executes the JS, collecting runtime type information, which is input (along with the bytecode itself) into the next compilation tier; only at that point is machine code generated.

                              The key idea is that v8 expects to execute the JS code before it can generate native code. It won't generate native code from parsing alone.

                            • rkeene2 6 years ago
                              A lot of research and hard work has gone into TclQuadCode [0], which compiles Tcl (which is even more dynamic than JavaScript) into machine code via LLVM.

                              The authors indicated at one point it took around 5 PhDs to get it going.

                              [0] https://core.tcl.tk/tclquadcode/dir?ci=trunk

                              • ridiculous_fish 6 years ago
                                This is bizarre and fascinating. I had no idea there were Tcl codebases of a size that could benefit from this sort of perf work. How much Tcl is out there?
                              • ndesaulniers 6 years ago
                                I actually think this is possible; and started prototyping it (because esprima is awesome, and not to many other languages have an equivalent that's so easy to use).

                                Some thoughts: I think it's easier to target C++ than C, since C++ can help you write more type generic code. I think it's easy to generate tagged unions, then for optimizations try to prove monomorphism. Finally, it may be simpler to start off with support for typescript, and fail to compile if there are any ANY types. I do think it's possible though. JS/TS -> C++ -> WASM (yes, I was out of my mind when I thought of this)

                                • johnhenry 6 years ago
                                  Isn't this kind of what V8 and other modern JavaScript engines do on the fly already?
                                  • mikece 6 years ago
                                    Yes: one of the Google engineers working on V8 talked about it here: https://softwareengineeringdaily.com/2018/10/03/javascript-a...

                                    It was this conversation that make me wonder if at some point in the future V8 might have experimental support natively for TypeScript but it makes more sense that compiling web assembly to native binary would make more sense. Who knows? It's an awesome time to be a programmer!

                                • maxxxxx 6 years ago
                                  With all the dynamic stuff Javascript has it seems really difficult to create performant C code.

                                  There is a PHP to .NET compiler which probably has similar problems. On second thought that one is probably easier because .NEt has a dynamic runtime.

                                  • nicoburns 6 years ago
                                    That, and in practice modern PHP is often relatively staticly typed (classes declare their fields, etc), and many codebases even include type annotations (which are supported in PHP and usually enforced at runtime).
                                  • tannhaeuser 6 years ago
                                    Congrats to completing this project. What's the status and further plans for it? I didn't find a license.
                                    • ridiculous_fish 6 years ago
                                      How are exceptions handled with this design? For example the `n < 3` may throw (it invokes the valueOf method).
                                    • magwa101 6 years ago
                                      Write it in C, there, problem solved.
                                      • maxgraey 6 years ago
                                        I wonder why people still try write transpiler from JS to C/C++ or LLVM (which make sense at least). But this not performant way and usually produce much bigger overhead than jit vm which use speculative optimizations.

                                        Some projects: 1. https://github.com/fabiosantoscode/js2cpp 2. https://github.com/raphamorim/js2c 3. https://github.com/ammer/js2c 4. https://github.com/NectarJS/nectarjs 5. https://github.com/ovr/StaticScript