Silkenweb Example: Hackernews Clone

Show HN: An interactive transformer “debugger” been working on in my free time

8 points by robkop 2 years ago | 4 comments

My focus has been shifting towards the ML alignment space recently, and in particular the ability to translate large transformer models into human understandable circuits and algorithms. This problem potentially isn't solvable, but it is one that some groups have had success with after large amounts of effort.

In attempting to address this issue, I've been developing Transpector. A tool scaling up and reducing the barrier to entry of techniques that these teams have been showing success with. Techniques aiming to understand the internal mechanics of the model. Currently this tool is focused on model activations but with more free time willing I'm planning to expend it into the gradient and weight spaces as well.

If you have some free time of your own, I encourage you to give it a try, I've found it's not only a bit of fun but its been a good way to help others build intuition of these models.

quickthrower2 2 years ago
hey, this looks pretty cool. I was about to start research into the tools you use to do stuff like find hyper parameters, debug the network and so on. Karpathy’s YT series aludes to the need to do such things but he hasn’t yet dug into that rabbit hole. I hope I get some time to try this out. But the visuals look great and make me think this would be worth trying out as a learning (as in me learning!) tool.
- robkop 2 years ago
  Appreciate the kind words, honestly Karpathy’s YT series is one of the best kickoff series I've ever seen. He has a certain ability to simplify complex problems and ideas that feels a bit Feynmanesque.
  And yes please do, and if you have any feedback I'd love to hear it! Half the motivation for this tool is trying to find a better way to build intuition for how these complex models actually function. I believe the best way to do this is by reducing iteration times as much as possible and by bringing models into worlds we understand. Spatially laying their components out and letting us toy with them, seeing what the impacts are and playing more. At the end of the day these models are so high dimensional it's just not possible to dig in and understand from the ground floor upwards, we need better ways to build intuition.
segmondy 2 years ago
Looks very neat. How do you use it? Looks like it's just to inspect a personal model and can't be applied to external models, is that right?
- robkop 2 years ago
  There's a task on my list to write a full tutorial using it to replicate some recent interpretability research (finding induction heads is up first). But even without a full tutorial, I've been surprised how quickly people have been able to pick up and understand it just by selecting a model and playing around.
  If you are interested there is this brilliant tutorial [1] by Callum McDougall for the Transformer Lens library. Going through its steps but completing them in Transpector would be a great way to learn it and build out intuition about transformers/ where research is today.
  On the model side, I've added a supported model list [2] and a gif of how to switch between models [3], I appreciate the feedback on what information is the most useful for the readme. Furthermore just being aware your question may have been in regards to API access only models (GPT4, Bard...), unfortunately Transpector requires access to the model weights and activations so currently it's not possible to use with those.
  [1]: https://colab.research.google.com/drive/1LpDxWwL2Fx0xq3lLgDQ... [2]: https://github.com/R0bk/Transpector/blob/main/docs/supported... [3]: https://github.com/R0bk/Transpector/blob/main/README.md