Ask HN: How to execute an 180B+ LLM on a Turing machine?
1 point by _ktnd 1 year ago | 0 commentsPicture this: you're stuck with your “potato computer” (small RAM, no external GPU, very large SSD), and your LLM is saved on an external SSD.
Your task: run that LLM on your “potato PC” and try to achieve reasonable response times (e.g., 1 h to 24 h). Response times of 1 year, or higher, might be impractical for most use cases.
And on a side note, how would you figure out the response times of a language model on low-end devices (e.g., Raspberry Pi, business laptops, MSP430)? Would you just assume some basic operations such as linear algebra operations as a given and estimate the number of steps from there?
I expect the usual suspects brought up in this discussion:
— Memory Mapped I/O aka treating an I/O device such as an SSD as if it were actual RAM (mmap). BTW: `mmap` makes our secondary storage somewhat akin to an infinite tape in a Turing machine
— “LLM in a flash: Efficient Large Language Model Inference with Limited Memory”, https://arxiv.org/html/2312.11514v2 (04 Jan 2024)
— SSD "wear and tear"