Architecture for a JavaScript to C compiler
90 points by timruffles 6 years ago | 45 comments- snek 6 years agoI don't understand why people keep making js-to-native compilers. the result is always less performance that what modern js engines can do to the original source code, because all those runtime behaviours the author mentions build up really fast. js engines actually run the code and figure out the types and such to create highly optimized machine code.
- nicoburns 6 years agoYeah, the JS runtimes are really impressive. I'd be interested to see what a compiler could do with TypeScript though. There could perhaps be guidelines of which features you can use in hot code paths in order to get optimised code, which could be enforced by the compiler for sections of the code which you mark...
- tjelen 6 years agoI think that the AssemblyScript project is halfway there. It compiles a stricter subset of Typescript to webassembly format, which can be then translated to CPU-specific machine code.
- masklinn 6 years ago> Yeah, the JS runtimes are really impressive. I'd be interested to see what a compiler could do with TypeScript though.
You can probably just compare AOT and JIT Java or C# implementations and get something of an upper bound.
Actual improvements will likely depend on your use case: for short-running programs the overhead of the parsing, JIT machinery & interpretation might dwarf the execution time so even a straightforward compilation full of virtual calls will edge out the pre-JIT interpreter, however for longer-running programs the JIT should be able to devirtualise, inline and optimise the code much better than the AOT compiler will.
Inlining is the most important optimisation, and it's hard to AOT inline dynamic dispatch.
- chrisseaton 6 years agoI would imagine JS JITs can determine more specific types at runtime than you can express in TypeScript source code.
- jdc 6 years agoCould you ELI5 why you think that?
- jdc 6 years ago
- tjelen 6 years ago
- ridiculous_fish 6 years agoModern JS engines rely on tracing JITs to achieve good performance. These optimizations only kick in once code has executed multiple times.
But a lot of JS code executes only once, such as layout code running at app launch. Heck, lots of JS code executes zero times. This code still imposes a cost (parsing, etc).
Consider the complaints about app launch time of Electron apps. A static compiler can be more effective at the runs-once or runs-zero cases.
- pizlonator 6 years agoNot tracing. Speculation.
The difference is that you compile user control flow as-is unless you have overwhelming evidence that you should do otherwise.
And yes you are right. This is promising for run-once code, but all of the cost will be in dynamic things like the envGet. An interpreter can actually do better here because most environment resolution can be done as part of bytecode generation. So it’s possible that this experiment leads to something that is slower than JSC’s interpreter.
- ridiculous_fish 6 years agoAgreed that envGet() will be brutal, but optimizations can eliminate a lot of that, e.g. by promoting locals to stack variables instead of heap variables.
It is possible to statically decide which environment contains each upvar. So I don't understand your conclusion that this experiment may be slower than JSC's bytecode interpreter. What information can JSC exploit at this stage that is statically unavailable?
- ridiculous_fish 6 years ago
- lttlrck 6 years agoThere has been a huge amount of work in V8 addressing these cases and concerns:
https://v8.dev/blog/launching-ignition-and-turbofanhttps://v8.dev/blog/v8-release-66https://v8.dev/blog/background-compilation
- ridiculous_fish 6 years agov8 is remarkable but what is described in those links is primarily a JIT with a cache tacked on. A pipeline optimized for static compilation would make different tradeoffs.
- ridiculous_fish 6 years ago
- chrisseaton 6 years ago> Modern JS engines rely on tracing JITs
Which modern JS engine still relies on tracing? I thought they’d all moved on from that technique many years ago, but I’m not an expert in JS.
- ridiculous_fish 6 years agoYes, I misused a term of art, thank you for the correction. I meant only to distinguish modern engines that rely on runtime profiling, from the wholly static approach in this post.
- ridiculous_fish 6 years ago
- maxxxxx 6 years agoCan the compiler really do much considering how dynamic Javascript is?
- ridiculous_fish 6 years agoDepends on your definition of "much", but more than you might expect.
Pre-parsing is an obvious big win, but there's also classical compiler optimizations: constant propagation, DCE, certain inlining, etc. Local variables can be statically determined to be non-captured, which means they can be stack allocated.
One totally bizarre but profoundly useful property of JS is the restrictions on eval. eval has the ability to run new code, but this code is globally scoped unless it's run as the "eval" function itself. [1] Thus it's easy to decide which local variables are immune to eval; this in turn unlocks lots of optimizations. To my knowledge this weirdo behavior is not present in Python or other dynamic languages.
- ridiculous_fish 6 years ago
- pizlonator 6 years ago
- timruffles 6 years agoOh sure, it's just for fun/learning! Check my first post.
- a13n 6 years agoMaybe you want to port some existing JS to another platform (eg. OS X, iOS, Android) where you want to run it natively rather than spinning up a JS engine and dealing with the bridge.
- roryrjb 6 years agoIt may be slower, but I'd give up speed if this would result in less memory usage.
- pizlonator 6 years agoOn the one hand, you’re totally right.
But maybe there will be some breakthrough, so it’s important to stay open-minded. That breakthrough may be something modest like if this style of JS execution was better for some niche use case.
- jacobush 6 years agoI could imagine a use case: you want to include some Javascript code base into your own EXE, but you don't want to drag in node.js or something heavy like that.
- nicoburns 6 years ago
- truth_seeker 6 years agoI wonder how much optimization will it bring to compared to existing JS runtimes such as V8.
Thanks to competing world for web browsers, JS runtimes not only efficiently parse to optimized native code but also provide really good JIT compilation benefits.
Speculative optimization for V8 - https://ponyfoo.com/articles/an-introduction-to-speculative-...
Parallel and Concurrent GC - https://v8.dev/blog/trash-talk
Good summary on 10 years of V8 - https://v8.dev/blog/10-years
- ridiculous_fish 6 years agov8-style engines do not parse to optimized native code.
As described in the links, v8 parses to an AST, which then is compiled to bytecode. A bytecode VM then executes the JS, collecting runtime type information, which is input (along with the bytecode itself) into the next compilation tier; only at that point is machine code generated.
The key idea is that v8 expects to execute the JS code before it can generate native code. It won't generate native code from parsing alone.
- ridiculous_fish 6 years ago
- rkeene2 6 years agoA lot of research and hard work has gone into TclQuadCode [0], which compiles Tcl (which is even more dynamic than JavaScript) into machine code via LLVM.
The authors indicated at one point it took around 5 PhDs to get it going.
- ridiculous_fish 6 years agoThis is bizarre and fascinating. I had no idea there were Tcl codebases of a size that could benefit from this sort of perf work. How much Tcl is out there?
- ridiculous_fish 6 years ago
- ndesaulniers 6 years agoI actually think this is possible; and started prototyping it (because esprima is awesome, and not to many other languages have an equivalent that's so easy to use).
Some thoughts: I think it's easier to target C++ than C, since C++ can help you write more type generic code. I think it's easy to generate tagged unions, then for optimizations try to prove monomorphism. Finally, it may be simpler to start off with support for typescript, and fail to compile if there are any ANY types. I do think it's possible though. JS/TS -> C++ -> WASM (yes, I was out of my mind when I thought of this)
- johnhenry 6 years agoIsn't this kind of what V8 and other modern JavaScript engines do on the fly already?
- mikece 6 years agoYes: one of the Google engineers working on V8 talked about it here: https://softwareengineeringdaily.com/2018/10/03/javascript-a...
It was this conversation that make me wonder if at some point in the future V8 might have experimental support natively for TypeScript but it makes more sense that compiling web assembly to native binary would make more sense. Who knows? It's an awesome time to be a programmer!
- mikece 6 years ago
- johnhenry 6 years ago
- maxxxxx 6 years agoWith all the dynamic stuff Javascript has it seems really difficult to create performant C code.
There is a PHP to .NET compiler which probably has similar problems. On second thought that one is probably easier because .NEt has a dynamic runtime.
- nicoburns 6 years agoThat, and in practice modern PHP is often relatively staticly typed (classes declare their fields, etc), and many codebases even include type annotations (which are supported in PHP and usually enforced at runtime).
- nicoburns 6 years ago
- tannhaeuser 6 years agoCongrats to completing this project. What's the status and further plans for it? I didn't find a license.
- ridiculous_fish 6 years agoHow are exceptions handled with this design? For example the `n < 3` may throw (it invokes the valueOf method).
- timruffles 6 years agoI went for longjmp(), non-local goto, for precisely the reason you highlighted: I realised pretty well everything in JS can throw, so needed something easy to trigger from anywhere
See https://github.com/timruffles/js-to-c/blob/1befbf4220753576e...
- timruffles 6 years ago
- magwa101 6 years agoWrite it in C, there, problem solved.
- maxgraey 6 years agoI wonder why people still try write transpiler from JS to C/C++ or LLVM (which make sense at least). But this not performant way and usually produce much bigger overhead than jit vm which use speculative optimizations.
Some projects: 1. https://github.com/fabiosantoscode/js2cpp 2. https://github.com/raphamorim/js2c 3. https://github.com/ammer/js2c 4. https://github.com/NectarJS/nectarjs 5. https://github.com/ovr/StaticScript