Show HN: TokenFlow – Visualize LLM inference speed

1 point by davely 4 months ago | 0 comments
How fast are your favorite LLMs? I recently saw a Reddit post where someone was able to get a distilled version of Deepseek R1 running on a Raspberry Pi. It could generate output at a whopping 1.97 tokens per second. That sounds slow. Is that even usable? I don’t know!

Meanwhile, Mistral announced that their Le Chat platform can output tokens at 1,100 per second! That sounds pretty fast? How fast? I don’t know!

So, that’s why I put together TokenFlow. It’s a (very!) simple webpage that lets you see the (theoretical) speed of different LLMs in action. You can select from a few preset models / services or enter a custom speed in tokens per second. You can then watch it spit out tokens in real time, showing you exactly how fast a given inference speed is and how it impacts user experience.

Check it out: https://dave.ly/tokenflow/

Github: https://github.com/daveschumaker/tokenflow