Silkenweb Example: Hackernews Clone

LLM price vs. performance (Google sheet)

9 points by harlanlewis 1 year ago | 2 comments

harlanlewis 1 year ago
I created this dense visual comparison to better understand and contextualize the precise relationships between capability, cost, and speed for text LLMs widely available via cloud providers today.
All values are sourced externally from publicly available data.
This sheet is only as good as the data I've found for it. Some values change over time (eg 0-100 normalized index), while others have contradictory sources. For example, OpenAI's self-reported metrics for GPT-4-turbo are quite close but not identical between their simple-evals repo[1] and the charts in the GPT-4o announcement[2]. For others, strong benchmark scores are prominent on marketing pages while weaker scores require some digging.
As a general rule of thumb, I've tried to: a) Include every metric I can find to help mitigate cherry-pick bias. b) Resolve conflicts by selecting what I consider to be either the more current or more trustworthy source. For what it's worth, I haven't come across any evaluation discrepancies with a meaningful margin of difference.
The folks I've shared this with so far have found it useful - I hope you do as well!
[1] https://github.com/openai/simple-evals [2] https://openai.com/index/hello-gpt-4o/
Sebmono 1 year ago
Love this!