Gemini Diffusion
61 points by og_kalu 1 month ago | 7 comments- heliophobicdude 1 month agoI've been let off the waitlist. So far, I'm impressed with the Instant Edits. It's crazy fast. I can provide a big HTML file and prompt it to change a color theme and it makes careful edits to just the relevant parts. It seems to be able to parallelize the same instruction to multiple parts of the input. This is incredible for refactoring.
I copied a shader toy example, asked it to rename all the variables to be more descriptive and it edited just the variable names. I was able to compile and run in shader toy.
- adt 1 month agoGood to see some more diffusion models:
- gs17 1 month agoIt's ludicrously fast, but it's not ludicrously intelligent, so trying their examples simply led to it failing 100x faster than normal Gemini. Still impressed though. It made a nice tic tac toe ish game, except the computer player became a human player after a few moves and couldn't fix it.
- minimaxir 1 month ago> Diffusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise, step-by-step. This means they can iterate on a solution very quickly and error correct during the generation process. This helps them excel at tasks like editing, including in the context of math and code.
This is deliberately unhelpful as it begs the question "why hasn't anyone else made a good text diffusion model in the years since the technology has been available?"
The answer to that question is that unlike latent diffusion for images which can be fuzzy and imprecise before generating the result image, text has discrete outputs and therefore must be more precise, so Google is utilizing some secret sauce to work around that limitation and is keeping it annoyingly close to the chest.
- smallerize 1 month agoMercury Coder https://news.ycombinator.com/item?id=43187518 has a paper and published code. https://huggingface.co/papers/2503.07197
- heliophobicdude 1 month agoThank you for sharing this. I'm amazed! Are there any known emergent abilities of it? I ran my evals and seems to struggle in very similar ways to smaller transformer based LLMs
- heliophobicdude 1 month ago
- erinaceousjones 1 month agoI heard it's been a difficult project to justify spending the research/computer time on at scale, because the models use an equivalent amount of compute for training and inference, but more parallelizable. So 5 times more compute units can be required and they get the work done 5 times faster. On a Google scale, that meant the hard internal sell of justifying burning through $25 million worth of compute units over 1 day instead of $5 million each day for 5 days. Something like that.
- smallerize 1 month ago
- 1 month ago