Ask HN: How expensive are LLMs to query, really?

5 points by teach 1 month ago | 3 comments
I'm starting to see things pop-up from well-meaning people worried about the environmental cost of large language models. Just yesterday I saw a meme on social media that suggested that "ChatGPT uses 1-3 bottles of water for cooling for every query you put into it."

This seems unlikely to me, but what is the truth?

I understand that _training_ an LLM is very very expensive. (Although so is spinning up a fab for a new CPU.) But it seems to me the incremental costs to query a model should be relatively low.

I'd love to see your back-of-the-envelope calculations for how much water and especially how much electricity it takes to "answer a single query" from, say, ChatGPT, Claude-3.7-Sonnet or Gemini Flash. Bonus points if you compare it to watching five minutes of a YouTube video or doing a Google search.

Links to sources would also be appreciated.