Interface Dispatch (2018)

82 points by matthewwarren 6 years ago | 36 comments
  • barrkel 6 years ago
    Some JVMs use fat pointers where the pointer embeds the dynamic type of the class rather than the statically declared type. This permits inline caches to verify the type of the instance without any indirections.

    Delphi implemented interfaces like the C++ approach, and in order to be compatible with COM, it didn't have a huge amount of flexibility: interfaces are mandated by the COM ABI to be called like this:

        reference->vtable->method(reference)
    
    This pretty much forces the vtable references to be stored in the object instance, one per implemented interface. Casting an instance to an interface means adding the offset to the vtable in the object data to the instance pointer.

    The vtables for every interface point at stubs which subtract the same offset and then jump to the implementation, so that object data is where it's expected for a normal instance method call.

    COM influenced the internal design of Java interfaces too - the first three slots were reserved for COM's lifetime management and typecasting methods. I'm not sure anything practically ever came from that though.

    I reused the interface calling mechanism when I implemented anonymous methods in Delphi. Anonymous methods can capture variables from their declaring scope; the captured state needs lifetime management in the absence of GC, so I piggy-backed on COM's lifetime management. This reduced the effort to interop with C++ too.

    • KyleJ61782 6 years ago
      I concur with the description of the Delphi implementation, as I've navigated it numerous times myself. I guess I don't understand the author's negative tone with regard to the C++/Delphi implementation. There's going to be overhead with any choice made and C++ and Delphi made the choice to sacrifice memory usage for speed. Furthermore, the only part that really scales with the number of objects is the extra vtable pointer for each implemented interface in each newly constructed object. Hardly a significant overhead in today's memory-prolific computing environment. Also, pointer fixup thunks and unconditional jumps are easy to speculatively handle.
      • latk 6 years ago
        The author is interested in PL design, and thinks that while C++ style object implementations are great at offering compatibility with certain systems or interfaces, other approaches allow for far nicer semantics in a greenfield programming language.

        For example, fat pointers make it possible to implement interfaces for existing types. I have experienced that as game-changing in Haskell, Go, Rust.

        I also disagree that any memory usage can be handwaved as “no longer relevant today”. Electron, or garbage collection: perhaps. But object size affects cache pressure, and cache utilization is critical to performance. C++ is typically good at keeping objects small, except when it comes to multiple inheritance.

        • KyleJ61782 6 years ago
          Honest question: do you have a real-world example that indicates that the extra pointers added due to multiple inheritance actually results in performance degradation due to extra cache pressure?
    • userbinator 6 years ago
      With the C++, there is also an optimisation which can be performed: put the method pointers in the object itself, instead of in a vtable. This is beneficial when there aren't so many objects nor methods, since it eliminates one indirection. Instead of obj->table.method() it turns into obj->method(). When I write "OOP" code in C, I usually do this instead of using a separate vtable.

      As an aside, I noticed the word "klass", then looked up at the domain name and confirmed my suspicions. ;-)

    • nwellnhof 6 years ago
      Here's another interesting approach for Java:

      https://dl.acm.org/citation.cfm?id=504291

      Interface methods are looked up in a hash table containing function pointers (the table index can be computed at compile time). If there's a collision, a small stub is generated that dispatches to the right method.

      • chrisseaton 6 years ago
        > C# generally follows the same approach as Java, but tries to perform more aggressive optimizations.

        I don't understand this when it doesn't have deoptimization?

        The Java example has one condition for fast path, while the C# has two nested loops and a condition inside that!

        • latk 6 years ago
          Author here. Both the JVM and CLR do some pretty aggressive optimizations even for normal interface calls. In the Java section I also show inline caching, a standard optimization approach. The CLR calls its approach to caching “virtual stub dispatch”. Speculative optimizations and inlining would be more advanced optimizations that I did not cover.

          But what about the worst case – the method lookup when the method is not in the cache? There, the CLR introduces an additional level of indirection for calls through interface types, which has benefits for memory usage and possibly dispatch time. But since this is the worst case, the complexity doesn't matter that much as long as the important cases (which can be cached) work well.

          • chrisseaton 6 years ago
            > In the Java section I also show inline caching, a standard optimization approach.

            Thanks. Does the CLR do inline caching for interface calls which cannot be devirtualised without speculation? Or not because of the lack of deoptimisation? Or does it have self-modifying code instead?

            • pjmlp 6 years ago
              It depends very much of which .NET runtime we talk about (NGEN, Pre-RyuJIT, RyuJIT, IL2CPP, Mono, MDIL, .NET Native, .NET Core, Burst), and which version of it.
              • latk 6 years ago
                CLR routes any interface calls through an indirect cell or pad that can be quickly rewritten to point to a different dispatch stub. Some stubs perform caching whereas the generic resolver behaves as explained by the pseudocode in my article.

                The “book of the runtime” has a graph visualizing these possibilities: https://github.com/dotnet/coreclr/blob/master/Documentation/...

                So this is a self-modifying code approach.

            • lozenge 6 years ago
              Perhaps it's memory optimisation? Especially considering all the instantiations of generic types that Java wouldn't have.
            • carapace 6 years ago
              If you liked this you might also like the COLA system: https://en.wikipedia.org/wiki/COLA_(software_architecture)

              > COLA is a system for experimenting with software design currently being investigated by the Viewpoints Research Institute. "COLA" stands for "Combined Object Lambda Architecture". A COLA is a self-describing language in two parts, an object system which is implemented in terms of objects, and a functional language to describe the computation to perform.

              • gumby 6 years ago
                fat pointer cost can be mitigated by doing pointer preunpacking in hardware. That kind of support fell out of favor when RISC started beating CISC architectures, but if dynamic dispatch became a significant performance issue I could see it going into certain processors. We have happily returned to an era of processor experimentation.
                • Insanity 6 years ago
                  Speaking of interfaces in Go, can anyone explain this particular result? We've done a benchmark recently at work to test which way we would best represent a set. (Linux, go1.12)

                        map[string]bool
                        map[string]struct{}
                        map[string]interface{} // where we put an empty struct in it.
                  
                  results:

                       BenchmarkBoolSet-4              5886572           223 ns/op          39 B/op           0 allocs/op
                       BenchmarkStructSet-4            6716685           206 ns/op          32 B/op           0 allocs/op
                       BenchmarkInterfaceSet-4         4299072           313 ns/op         115 B/op           0 allocs/op
                  
                  
                  (I'm aware this is a bit of a tangent to TFA)
                  • bouk 6 years ago
                    Can’t really say what’s going on here without seeing some code.
                    • Insanity 6 years ago
                      Well the code is the same for all of them (but when using a bool map we assign true instead of empty struct). So we run this code inside a benchmarking function:

                            m := map[int]interface{}
                            // or m := map[int]struct{}
                            for i := 0; i < t.N; i++ {
                                 m[i] = struct{}{}
                            }
                      
                      edit: fix code.
                • abecedarius 6 years ago
                  I'm curious about the history of the fat-pointer approach. The first exposition I personally ran across was http://erights.org/enative/fatpointers.html in the 90s. Can anyone point out others?
                • Anonymous4C54D6 6 years ago
                  The ascii art is broken.
                  • latk 6 years ago
                    (Author) I am aware of the problem. For me it works on desktop but not on mobile browsers – apparently the Unicode box drawing characters are not really fixed-width on all systems?

                    One day I'll write a Pandoc filter that pipes these diagrams through a Shaky or Ditaa type tool first…

                    • munificent 6 years ago
                      > For me it works on desktop but not on mobile browsers

                      Looks weird for me on Mac desktop Chrome. It looks like the default monospace font, Courier, lacks box drawing characters. So it probably falls back to some other monospace font which likely has different metrics.

                      A good CSS font stack for monospace fonts is something like:

                          font-family: Consolas, Monaco, monospace;
                      
                      The first catches almost all Windows users, the second almost all Mac users, and the last fallback to the default.
                    • gumby 6 years ago
                      Not in Mac Safari
                      • signa11 6 years ago
                        not on firefox/chromium