Jan Leike joins Anthropic on their superalignment team
99 points by icpmacdo 1 year ago | 33 comments- Lerc 1 year agoI was very impressed with Anthropic's paper on Concept mapping.
Post https://www.anthropic.com/news/mapping-mind-language-model
Paper https://transformer-circuits.pub/2024/scaling-monosemanticit...
This seems like a very good starting point for alignment. One could almost see a pathway to making something like the laws of robotics from here. It's a long way to go, but a good first step.
- mvkel 1 year agoThese superaligners.
"I am breaking out on my own! Together we will do bigger and better things!!!"
"Ok I'll join the other guys."
I think it's pretty clear that the capital markets have next door to no interest in alignment pursuits, and only the most-funded apply a token amount of investment towards it.
- whimsicalism 1 year ago@dang - I find topics like these quite interesting. Are they downweighted due to AI relatedness (or is twitter?) or just being flagged a lot?
- Imnimo 1 year ago"Automated alignment research" suggests he's still interested in following the superalignment blueprint from OpenAI. So what do you do while you're waiting for the AI that's capable of doing alignment research for you to arrive? If you believe this is a viable path, what's the point of putzing around doing your own research when you'll allegedly have an army of AI researchers at your command in the near future?
- solveit 1 year agoWell, I presume you have to figure out how to evaluate their output, especially for trustworthiness. And that's something you have to do the core of yourself, no matter how many AI researchers you'll have.
- Imnimo 1 year agoThe premise of the plan is that evaluating output is easier than producing it, such that a human researcher could look at the AI researcher's output and tell if it's correct and trustworthy. If this is true, what else is there to figure out?
- Imnimo 1 year ago
- whimsicalism 1 year ago> what do you do while you're waiting for the AI that's capable of doing alignment research for you to arrive
Nobody interested in superalignment is interested in waiting until actually threatening AI gets here.
- Imnimo 1 year agoBut that's the fundamental superalignment plan - train a human-level alignment researcher AI, run a bunch of them in parallel, and review their research output to see if they solve the alignment problem. You can't do the plan until the human-level alignment researcher AI already exists.
- whimsicalism 1 year agoA large part of the idea is that you can develop techniques for aligning sub-human AI using even stupider AI and hope/pray that continues to generalize once you get to super-human AI being aligned by human-level AI.
- whimsicalism 1 year ago
- Imnimo 1 year ago
- DalasNoin 1 year agoCurrent systems are already (in a limited way) helping with alignment, anthropic is using its AI to label the sparse features of their sparse auto encoder approach. I think the original idea of labeling neurons by AI came from william saunders, who also left openai recently.
- warkdarrior 1 year agoI think his tweet can be read as "research in (1) scalable oversight, (2) weak-to-strong generalization, and (3) automated alignment".
- solveit 1 year ago
- smountjoy 1 year ago"Superalignment" is (was?) OpenAI's term, so it might be more accurate to say he is joining Anthropic to work on alignment.
- sp332 1 year agoLooks like superalignment was Jan Leike's term, since the team at OpenAI dissolved immediately without him.
- eminence32 1 year agoIs there a difference between "superalignment" and "alignment" ?
- rfw300 1 year agoYes. “Superalignment” (admittedly a corny term) refers to the specific case of aligning AI systems that are more intelligent than human beings. Alignment is an umbrella term which can also refer to basic work like fine-tuning an LLM to follow instructions.
- thefaux 1 year agoIs this not something of an oxymoron? If there exists an ai that is more intelligent than humans, how could we mere mortals hope to control it? If we hinder it so that it cannot act in ways that harm humans, can we really be said to have created superintelligence?
It seems to me that the only way to achieve superalignment is to not create superintelligence, if that is even within our control.
- afefers 1 year agoHuh! All this time I thought the "super" was just for branding/differentiation.
- halfjoking 1 year agoThen why don't they call politicians "super-politicians"?
Their purpose is to control the population by being lesser beings who feed off corporations and just push their message.
- thefaux 1 year ago
- exe34 1 year agoi suppose the difference between imaginary and "super"-imaginary isn't very important from a practical point of view.
they worry about alignment for ai, I worry about alignment for the corporations that wield technology, any technology.
- 1 year ago
- 1 year ago
- throw5345346 1 year agoOh yes. One is super.
- rfw300 1 year ago
- sp332 1 year ago
- htrp 1 year agoit's also completely theoretical, until it isn't (ref paperclip maximizers)
- oldpersonintx 1 year ago[dead]
- 1 year ago
- andrewfromx 1 year agoI keep getting Anthropic and Extropic (Guillaume Verdon / Beff Jezos) names mixed up. Anthropic is Claude and Extropic is Thermodynamic hardware many orders of magnitude faster and more energy efficient than CPUs/GPUs.*
* parameterized stochastic analog circuits that implement energy-based models (EBMs). Stochastic computing is a computing paradigm that represents numbers using the probability of ones in a bitstream.
- whimsicalism 1 year agoyes, one is a real company and one is...
- sanjeetsuhag 1 year ago> Thermodynamic hardware many orders of magnitude faster and more energy efficient than CPUs/GPUs.
I’m sorry, but is this thermodynamic hardware real? Are there any benchmarks? Those claims are pretty strong.
- DiabloD3 1 year ago"Yes", but only extremely simple demo circuits.
Basically, they are hedging a bet on the following: When you perform a calculation, the electricity that went into the circuit only exits as the answer, anything else that didn't become the answer turns into waste heat and electromagnetic fields.... what if you reversed the calculation, and the only waste produced is transmission of the answer?
If you know anything about EE, you'd know that what I said is an extremely simple view of how modern ALUs are made, and ignores the past 40+ years of optimizations; however, they believe by "undoing" the optimization and "redoing" it as an entirely reversible operation not only will work, but will the final optimization we can make.
There will be no benchmarks of the kind you want, because that isn't the issue: I can take any CPU off the shelf today, and run it 10 times faster: it will melt because of self-generated heat, but for a glorious microsecond, it will be the fastest CPU on earth.
They are stating that they have potentially fixed one of the largest generators of waste heat, which would allow us, using all of our existing technology, to start ramping up our clockspeeds, and our true final frontier will be trace lengths at macroscale (which is already a problem at the clockspeeds we use for DDR5 and PCI-E 6).
However, given how Extropic's website says none of what I just said, they're probably just some startup trying to ride the AI wave, and then close shop in a few years. I doubt they've magically figured out one of the hardest problems in EE atm. They are also not the only company in this space, and every single major semiconductor company in the world is trying to solve it.
- whimsicalism 1 year agofrom my understanding, this will only be able to accelerate EBM (energy-based models) which they could scale up in simulation to show that they would be useful
EBMs as of now are not really that useful at all.
- whimsicalism 1 year ago
- andrewfromx 1 year ago
- geodel 1 year agoWell you've got Beff Jezos. This is as real as it gets.
- DiabloD3 1 year ago
- whimsicalism 1 year ago