Silkenweb Example: Hackernews Clone

Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI

27 points by zathan 5 months ago | 19 comments

ahzhou 5 months ago
I might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL.
Interesting to note - we have no idea how much R1 cost to train. To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.
- pptr 5 months ago
  What is different about Deepseek's use of MoE vs all the other MoE models that makes training more efficient?
  FP8 training and GRPO make sense to me, but that only gets you a 4x improvement total, right?
  - ahzhou 5 months ago
    They slightly restructure their MoE [1], but I think the main difference is that other big models (e.g Llama 504B) are dense and have higher FLOP requirements. MoE should represent a ~5x improvement. FP8 should be about a ~2x improvement.
    We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.
    There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.
    [1] https://arxiv.org/abs/2401.06066
    - karmakaze 5 months ago
      It's interesting that only having access to less powerful hardware motivated/necessitated more efficient training--like how tariffs can backfire if left in place too long.
marjann 5 months ago
What a time to be alive. Chinese companies were copying everything from the west, now it seems the opposite.
- tetris11 5 months ago
  (annoying voice) "Hold on to your papers...!"
- floydnoel 5 months ago
  there's entire books on this phenomenon, it isn't at all unusual. happened to japan, korea, etc. moving up the value chain!
bamboozled 5 months ago
Can anyone explain why Meta's share price was untouched by the deep seek announcement ? They have spent billions on AI infra?
According to this article they are rattled in some way...
- alecco 5 months ago
  OpenAI and others are valued for expected future revenue of running the models. And they were also valued as having magic "secret sauce" in their closed source models. Investors are now pulling back from this kind of company.
  Deepseek is open source and based on Meta's open source Llama models. So Meta can easily run Deepseek on their pipeline.
  The revenue model for both Meta and Deepseek is to apply the model to their business, not just sell it as a chatbot or API. That's why they publish it, they benefit from the community improvements and ironing out bugs.
- bravetraveler 5 months ago
  My guess: they're somewhat uniquely positioned for the data. With 'the feeds' they're closer to a source/can withstand more. They plan to monetize another way
  I'm imagining four rooms of candlelight and collective reading of publications. "War room" is executive-speak for "Important/Urgent Panic" or "rearranging deck chairs on the Titanic"
  Four war rooms to read a document; so Meta
- edmundsauto 5 months ago
  This interpretation is heavily based on the journalists choice of words designed to create drama. If Meta can recreate this success in llama, they just cut their power bill by 80+%. That deserves jumping on something immediately and not waiting for next half’s planning cycle.
  Spun differently - Meta just reacted to take advantage of a new opportunity in just a couple of weeks. Completely reshooting an entire years worth of work for dozens of engineers. That sounds… appropriate? For an announcement big enough to chop $600M off nvidia market cap.
  Come to think of it, I wonder how much meta spends on AI power. 80% of that number could be a billion dollars.
- Ekaros 5 months ago
  They are still social-media company. And make most money from there. AI is like metaverse bets. And AI being cheaper to create might even be positive for them, if they can figure out a use case.
- rchaud 5 months ago
  They make all their money on ads in FB and IG. It's how their stock barely budged despite losing $30b on a VR ghost town.
- YetAnotherNick 5 months ago
  They are the users of AI, not sellers of AI. Better and cheaper AI would benefit them, no matter who trained it.
- znpy 5 months ago
  i think it's because openai makes a bunch of money off "AI stuff" by being regarded the best at this game... and guess what, there's a new player that makes "AI stuff" as good as them (or possibly better) and maybe even cheaper. this could be a threat to their source of revenue.
  Meta on the other hand makes money off whatsapp, facebook, instagram and threads. for meta an additional provider of "AI stuff" is not a threat to their source of revenue.
- maxglute 5 months ago
  Expensive models are AI companies core business.
  Meta can use cheap models to enhance core business.
OfCounsel 5 months ago
Meta has been aware of DeepSeek for a long time (as Zuckerberg mentioned the company by name in his podcast with Joe Rogan) and a “war room” is just a meeting room.
- ryandrake 5 months ago
  My experience is that a "War Room" is just a meeting room, but one where 1. engineers are rounded up to work in (because as we all know, developers type code faster when co-located in a single room under pressure), and 2. where panicked executives occasionally wander in to say things like "How are things going?" and "What's the current status?" and "Do you have an ETA for when we can stop panicking?"
hulitu 5 months ago
> Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI
"Gentlemen, you can't fight in the war room."
terryjiao 5 months ago
[dead]