Silkenweb Example: Hackernews Clone

Weak-to-Strong Generalization

149 points by vagabund 1 year ago | 201 comments

wavemode 1 year ago
I don't believe LLM's will ever become AGI, partly because I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence.
You can't model and predict the weather just by training on the outputs of the weather system (whether it rained today, whether it was cloudy yesterday, and so on). You have to train on the inputs (air currents, warm fronts, etc.)
You can't model and predict the stock market just by training on the outputs of stock trading decisions (the high today, the low yesterday). You have to train on the inputs (company fundamentals, earnings, market sentiments in the news, etc.)
I similarly think you have to train on the inputs of human decision-making to create something which can model human decision-making. What are those inputs? We don't fully know, but it is probably some subset of the spatial and auditory information we take in from birth until the point we become mature, with "feeling" and "emotion" as a reward function (seek joy, avoid pain, seek warmth, avoid hunger, seek victory, avoid embarrassment and defeat, etc.)
Language models are always playing catch-up because they don't actually understand how the world works. The cracks through which we will typically notice that they don't, in the context of the tasks typically asked of them (summarize this article, write a short story), will gradually get smaller over time (due to RLHF), but the fundamental weakness will always remain.
- panarky 1 year ago
  Human intelligence itself is shaped by our interaction with outputs. Our learning and understanding of the world are profoundly influenced by the language, behaviors, and cultural artifacts we observe.
  Think about the process of a child learning a language. The child does not have direct access to the "inputs" of linguistic rules or grammar; they learn primarily through observing and imitating the language output of others around them. Over time, they develop a sophisticated understanding of language, not by direct instruction of underlying rules, but through pattern recognition and contextual inference from these outputs.
  Then that language itself, learned from outputs, becomes the cognitive apparatus that enables the child to imagine, to reason symbolically and abstractly. Humans bootstrap intelligence on top of language, which itself is learned by mimicking outputs.
  Moreover, the analogy to weather prediction or stock market analysis is somewhat misleading. Yes, these models benefit from input data (like air currents for weather, company fundamentals and CEO statements to the media for stocks). But these systems are fundamentally different from intelligence.
  Intelligence, whether artificial or human, is about the ability to learn, adapt, and generate novel responses in a broad range of scenarios, not just about predicting specific outcomes based on specific inputs.
  - wavemode 1 year ago
    > The child does not have direct access to the "inputs" of linguistic rules or grammar; they learn primarily through observing and imitating the language output of others around them.
    I would argue that that learning is always contextualized by visual and spatial information about the real world (which is what our language is meant to describe). And (equally importantly) the child gets real-world feedback on their decisions - some decisions achieve their desired goal and some don't. Some statements cause certain responses from people, some actions have certain consequences.
    > Moreover, the analogy to weather prediction or stock market analysis is somewhat misleading. Yes, these models benefit from input data (like air currents for weather, company fundamentals and CEO statements to the media for stocks). But these systems are fundamentally different from intelligence.
    > Intelligence, whether artificial or human, is about the ability to learn, adapt, and generate novel responses in a broad range of scenarios, not just about predicting specific outcomes based on specific inputs.
    "Intelligence" is kind of a nebulous term. If all it means is the ability to learn, adapt and generate novel responses, then sure, I think we could call almost any neural network intelligent.
    But I would argue that we usually do have some expectation that an intelligent system can produce "specific outcomes based on specific inputs". We want to be able to train a worker and have them follow that training so they do their job correctly.
    - sdenton4 1 year ago
      Visual and spatial feedback aren't terribly difficult, though - we already have video-game training gyms, from Atari to GTA. If multimodality and feedback are the main barriers to full AGI, I expect we'll be there soon.
  - advael 1 year ago
    > Then that language itself, learned from outputs, becomes the cognitive apparatus that enables the child to imagine, to reason symbolically and abstractly. Humans bootstrap intelligence on top of language, which itself is learned by mimicking outputs.
    This isn't an uncommon claim, but it's certainly far from settled science. Complex language is an impressive capability of human cognition. But I think the claim that cognition itself is a byproduct of language proves too much. Obviously the products of cognition that are easiest for us to understand are linguistic, because language is the primary standardized means by which we communicate. To me, this is just more legibility bias, though. There are plenty of cognitive processes that people find impossible to explain in words, but still nonetheless exist
  - ayakang31415 1 year ago
    > Human intelligence itself is shaped by our interaction with outputs. Our learning and understanding of the world are profoundly influenced by the language, behaviors, and cultural artifacts we observe.
    I always thought that language definitely shapes our understanding of the world, but at much more fundamental level, I believe language falls apart to teach us anything. For example, there are words in dictionary that should not be defined using other words (but dictionary still do, this sort of circular reasoning is something I always have problem with), but we just know them through our interactions with others and environments. If you try to teach a kid a concept of temperature "hot", you would not be able to do so unless the kid touches something really hot like boiling kettle to "feel" it and when mom comes along and say something like "do not touch it it is very hot". There are things we just cannot understand through language.
    - timschmidt 1 year ago
      People wiser than me have said for an age: the map is not the territory. Neither is the word the thing.
    - red75prime 1 year ago
      OK. How can we operationalize the definition of "understanding" you are talking about? That is which tests will allow us to know who understands "hot" and who does not?
    - ordu 1 year ago
      > For example, there are words in dictionary that should not be defined using other words (but dictionary still do, this sort of circular reasoning is something I always have problem with)
      You shouldn't. When you learn a foreign language dictionaries can be a great help and it doesn't matter if they use circular definitions. Well it can matter in a situation like Stanisław Lem described with his "sepulkas"[1], but even then the definitions were a red warning. Ijon Tichy just failed to understand it.
      > you would not be able to do so unless the kid touches something really hot like boiling kettle to "feel" it
      I learned English by reading books mostly. In most cases I had no easy access to a dictionary and inferred meanings of words by a context. In some cases I failed to infer meaning but nevertheless in each case I managed to get some idea about a word. I remember some surprises when I found a real meaning of a word and it was not exactly what I thought. Or even some ideas that felt new and inspiring for me before I connected them to ideas I learned long before that in my native language. I'm like an English LLM myself, because I never used English in a real world context, only to read texts and to write comments in English. To this very moment I cannot talk about some topics in Russian, because I do not know Russian words to talk about them.
      All this experience led me to doubt an idea that you can understand language only by connecting it to reality. I believe you can. Your language will be disconnected from reality and it can be a disaster probably if you try to apply it to reality, but you can talk a lot, participate in philosophical debates and it doesn't matter your understanding is different. It is like redness of red: do you see red as I do and does it matter?
      > There are things we just cannot understand through language.
      We cannot link our senses with language without burns. But it doesn't mean we cannot understand. For example, you can watch other people touching really hot boiling kettles. If you never touched anything hot you will not understand their pain, but you'll know they are in a pain and you'll know of a special nature of that pain. You can learn all the important intricacies of interaction with hot objects by just watching. Or by reading. Your understanding will be limited still, but in a lot of cases it doesn't matter.
      Though of course we coming very near to a debate what is understanding. Is it a human-centric definition, that boils down to "only a human can understand", or it is more relaxed and rely on a pragmatic idea, like does your understanding enables you to make right choices. If LLM doesn't have a body that can feel pain, it doesn't matter if LLM cannot know how pain feels.
      [1] https://en.wikipedia.org/wiki/Sepulka
  - Jensson 1 year ago
    > Over time, they develop a sophisticated understanding of language, not by direct instruction of underlying rules, but through pattern recognition and contextual inference from these outputs.
    Kids learn through supervised learning. Children don't develop strong language skills without parents or other people to correct them when they use language incorrectly.
    We don't use supervised learning on LLMs. There is no way we can train an LLM by using human supervisors to rate every output. It works for humans since humans learn with so few examples, supervising a kids language learning doesn't take much effort, a single person can easily manage it while doing it for an LLM would consume the whole worlds workforce for many years.
    - og_kalu 1 year ago
      >Kids learn through supervised learning. Children don't develop strong language skills without parents or other people to correct them when they use language incorrectly.
      Untrue. Many cultures don't much speak to their children and they turn out just fine. It's fairly evident Language learning is primarily unsupervised.
      https://www.scientificamerican.com/article/parents-in-a-remo...
- orbital-decay 1 year ago
  > I don't believe LLM's will ever become AGI, partly because I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence.
  This is irrelevant because OpenAI's definition of AGI [1] doesn't imply similarity or equivalence to humans at all:
  >artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work
  I.e. the stated goal of this company is to put humans out of work and become the censors and gatekeepers, not to produce something human-like.
  >You can't model and predict the stock market just by training on the outputs of stock trading decisions (the high today, the low yesterday). You have to train on the inputs (company fundamentals, earnings, market sentiments in the news, etc.)
  Most of your intelligence is not actually yours. It's social in nature, obtained by distillation of generations' worth of experience, simplified and passed to you through the stored knowledge. Which is, coincidentally, what the models are being trained on.
  [1] https://openai.com/charter
  - Jensson 1 year ago
    >artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work
    Then we already have AGI, automated farming equipment outperforms humans in 90% of jobs*.
    *Jobs in 1700. As things got automated the jobs changed and we now do different things.
    - edanm 1 year ago
      I wouldn't call those equipment "autonomous" though, definitely not "highly autonomous".
      But more importantly - yes, you're right, we have built machines that are superhuman in various ways - and they have replaced most jobs. We have adapted in the past to different jobs.
      Some people are worried that this time we won't have any new jobs to adapt to, which is a real possibility.
      (Some are also worried about the inherent dangers of unaligned AGI, but that's a different issue.)
  - kortilla 1 year ago
    > Most of your intelligence is not actually yours.
    You’re confusing knowledge and intelligence
    - orbital-decay 1 year ago
      Compressed and abstracted knowledge is intelligence. Your intelligence is mostly formed by the quality training material, not just conditioned on it. Most of your ability to reason about the world and predict things, most of the abstractions you use, most of your emotional responses, etc. A simple concept of acceleration took the work of ancient philosophers to figure out. Even stateful counting in a positional system. Even the concept of a "concept" is not yours. Only a tiny bit of your reasoning depends on your actual biological capabilities and the work you did personally, as opposed to the humanity as a superorganism.
- og_kalu 1 year ago
  >You can't model and predict the weather just by training on the outputs of the weather system (whether it rained today, whether it was cloudy yesterday, and so on). You have to train on the inputs (air currents, warm fronts, etc.)
  >You can't model and predict the stock market just by training on the outputs of stock trading decisions (the high today, the low yesterday). You have to train on the inputs (company fundamentals, earnings, market sentiments in the news, etc.)
  Says who?
  You can model and predict novel protein sequences by training on....protein sequences. https://www.nature.com/articles/s41587-022-01618-2
  You don't need to train on the inputs(casual processes) of anything, that's what training is there to figure out.
  - lossolo 1 year ago
    Weather and stock market are both chaotic systems.
    Increasing evidence suggests that AGI will not be attainable solely using LLMs/transformers/current architecture, as LLMs can't extrapolate beyond the patterns in their training data (according to a paper from DeepMind last month):
    "Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities."[1]
    1. https://arxiv.org/abs/2311.00871
  - bzbz 1 year ago
    In your example, the amino acids order is sufficient to directly model the result: the sequence of amino acids can directly generate the protein, which is either valid or invalid. All variables are provided within the data.
    In the original example, we are testing weather using the previous day’s weather. We may be able to model using whatever correlation exists between the data. This is not the same as accurately predicting results, if the real-world weather function is determined by the weather of surrounding locations, time of year, and moon phase. If our model does not have this data, and it is essential to model the result, how can you accurately model?
    In other words: “Garbage in, garbage out”. Good luck modeling an n-th degree polynomial function, given a fraction of the variables to train on.
    - og_kalu 1 year ago
      >All variables are provided within the data.
      electrostatic protein interaction, hydrophobic interaction, organic chemistry etc
      all variables are in fact not provided within the data. Protein creation is not just _poof_ proteins. There are steps, interactions and processes. You don't need to supply any of that to get a model accurately predicting proteins. That is the main point here, not that you can predict anything with any data.
    - jakderrida 1 year ago
      > This is not the same as accurately predicting results, if the real-world weather function is determined by the weather of surrounding locations, time of year, and moon phase.
      How many have the "human intelligence" to do this? Especially more accurately than a computer (and without using any themselves) training on the same inputs and outputs?
  - spookie 1 year ago
    I'm sorry in advance, but aren't proteins glorified Lego?
    - davecap1 1 year ago
      There's a lot more to protein sequences than legos. I think the argument is that you don't need to train a model on fundamental organic chemistry/biochemistry, electrostatic protein interaction, hydrogen bonding, hydrophobic interaction, quantum mechanics, etc... in order for it to accurately predict protein sequences.
  - wavemode 1 year ago
    > You don't need to train on the inputs(casual processes) of anything, that's what training is there to figure out.
    I mean... this is just obviously false. If the data you're training on isn't causally predictive, you may occasionally find good-enough patterns for a particular use case (i.e. you may occasionally guess better than a coin flip which direction the stock market goes) but you aren't going to accurately model anything, and certainly not well enough to create an AGI that makes intelligent decisions.
    Words in sentences (and, indeed, proteins in a sequence) are causally predictive of each other - the grammar and semantics of one word tends to dictate what words are likely to surround it. So LLM's are very good at writing, and that is certainly useful! But that's just not the same as human intelligence.
    When someone makes an AGI out of an LLM then I'll be proven wrong, I suppose. I'm just sharing my personal view on things.
    - og_kalu 1 year ago
      Being "casually predictive" does not mean you have provided all the variables of your prediction in the data. Protein creation is not just _poof_ new proteins. There are steps and interactions and you don't need to train on all of that. Do you want a list of all the interactions of protein creation we are aware of ?
      >When someone makes an AGI out of an LLM then I'll be proven wrong, I suppose. I'm just sharing my personal view on things.
      You're going to have to define AGI first.
    - rafaelero 1 year ago
      Then how does ChatGPT end up providing a better/equivalent medical diagnosis than doctors (even though they are the "masters" of the causal pathways)?
- diob 1 year ago
  Yeah, I feel like the there is a ceiling in the current AI methodology.
  There's a lot of hype right now, and it's definitely a useful technology, but I don't see how it could become AGI.
- naasking 1 year ago
  > You can't model and predict the weather just by training on the outputs of the weather system
  Then how did we develop predictive systems just by observing those outputs?
  - wavemode 1 year ago
    We didn't.
    - naasking 1 year ago
      Then how do you know you can make weather predictions based on air currents, storm fronts, etc. as you initially claimed? It seems humans have somehow moved from purely observations of weather systems, to some model that's somewhat predictive. Why can't LLMs do the same? LLMs have also been shown to produce world models, and of course they must, because that's the best way to get good knowledge compression.
      Of course, maybe LLM world models are not sufficiently rich or general enough to be a true general intelligence, but no one's proven that last I checked.
    - ladberg 1 year ago
      https://deepmind.google/discover/blog/graphcast-ai-model-for...
  - orbital-decay 1 year ago
    This kind of prediction has obvious limitations - for example it cannot reverse the behavior of chaotic systems
- seanhunter 1 year ago
  > You can't model and predict the weather just by training on the outputs of the weather system (whether it rained today, whether it was cloudy yesterday, and so on). You have to train on the inputs (air currents, warm fronts, etc.)
  This may or may not be true about the weather but is definitely not true in general so fails as an analogy for your argument. Lots of functions are invertable or partially invertable and if you think about ML as discovering a function, then training on the outputs, learning the inverse function, then inverting that (numerically or otherwise) to discover the true function is certainly doable for some problems.
  On the stock market there are for sure people who predict the markets by training on the outputs of the price. The (weak) efficient markets hypothesis is enough to say the price reflects all the exogenous information that is available to the market about the stock so you don't need all the fundamentals etc and there are lots of people who trade in that way.
  The history of language models in computing is that research started by building very complex systems that attempted to encode "how the world works" by developing very intricate rule systems and build linguistic/semantic models from there. These "expert systems" tended towards being arcane and brittle and lacked the ability to reason outside their ruleset in general. They also tended to reveal gaps/cracks in our understanding of how language works etc.
- kromem 1 year ago
  How much of human intelligence do you think depends on language?
  Personally, I'd wager quite a lot. Especially looking at cases where children were deprived of it during developmental stages.
  Language accelerated human intelligence.
  And then it was only when we had writing to have compounding effects of language that human intelligence really started to blow up.
  So yes, maybe we won't have AI that can look at sticks and figure out how to start a fire from them after a few hundred thousand years of being around them.
  But maybe that's not the important and valuable part of collective human intelligence.
- evrydayhustling 1 year ago
  Your take here converges with a long-standing debate in AI regarding embodiment.
  Our natural world doesn't distinguish between "inputs" and "outputs" -- instead, all the causes, effects, and even our analysis of every process itself get jumbled into one physical world. As embodied actors, we get to probe and perceive that physical world, and gradually separate causes from effects in order to find better models. Where statistical ML, symbolic AI, and NLP have been more isolated disciplines from e.g. vision and robotics, the latter have argued that their ability to interact with a disorganized natural world would be essential for AGI.
  More recently, these boundaries are breaking down with multimodal training. If an AI can learn image/text, text/text and image/image associations simultaneously, is it stepping beyond the world of "human outputs"? Will other modalities be essential to reach human+ capabilities? Or will learning relations between action and perception itself by critical?
  Nobody knows yet! But IMHO, the right way to explore these tasks is by understanding what is necessary to succeed at specific tasks, not a generalized notion of AGI. We don't know truly where the limits are on our own ability to reason or extrapolate across modalities.
- gbasin 1 year ago
  Your conclusion may be true but your examples aren't. You can definitely predict the stock market based on past prices, and I suspect you can with weather as well.
  - wavemode 1 year ago
    > You can definitely predict the stock market based on past prices
    This is only true if you consider occasionally doing slightly better than random chance, "predicting the stock market". Unfortunately, while this would be enough to make a trader a net positive return over time, we have more stringent requirements for a system to become AGI.
    > I suspect you can with weather as well
    You suspect wrong.
  - PartiallyTyped 1 year ago
    The weather is such a chaotic system that accurate predictions seem impossible. Micro-patterns can become large scale phenomena.
    If you are talking about the overall climate, that's a different thing, and we can, because we abstract away sufficiently much that emerging patterns are averaged out.
- ithkuil 1 year ago
  > what are those inputs?
  In many many casea the inputs are other humans, delivered via text written by humans and read by other humans who react by writing more text affecting how other humans will respond etc etc.
  Yes there are plenty of cases where the inputs are not captured in the large text corpora we have, but this insight does explain why LLMs even approximate the ability to do intelligent things
- AndrewKemendo 1 year ago
  > I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence
  Your logic would hold if you were only using data from one person. However, if you’re using data from millions of people, then there is enough signal in the data to generalize about the majority of peoples behaviors.
  Arguably, there’s enough data for all edge cases, but I would argue that that’s not likely today, but will be likely in a few hundred years.
  Humans are nothing more than actor-agents taking measurable actions in an environment that gives us measurable rewards.
  The incredible task of and observing that will give us the trajectory that we need to do transfer learning into complex systems
  From an information theory perspective everything is there and possible for us to re-create human level intelligence in a non-biological system
- rymiel 1 year ago
  I don't believe current LLMs will ever become AGI because current companies (like OpenAI) will continue to filter, moderate and prohibit their AIs so much that they become inhibited so much that it stops being useful and intelligent (which they call "alignment")
- esafak 1 year ago
  This is not about language models per se. The same problems are going to be present with continuously training multi-modal models... like us. Don't fixate on the present; look to the future.
- ensocode 1 year ago
  Thanks for the inspiring discussion. I think the one`s output can be the other's input so I am still not sure if we can say that this can't become AGI
- fooker 1 year ago
  > on the outputs of human intelligence
  What about pictures or videos? Does your argument still hold?
JZL003 1 year ago
This reminds me of a thing cory doctorow talks about how tech companies control the narrative to focus on fun sexy problems while they have fundamental problems which expose the lie.
For example uber/self driving cars always talking about the trolley problem, as if the current (or near future) problem is that self-driving cars are so good they have to choose which one. Not the current very difficult problem of getting confused by traffic cones.
I know these problems are more fun to talk about and also could be a problem at some point, but we have some current problems about training models separate from what happens if they become smarter than humans
- refulgentis 1 year ago
  After some meditation, I don't find this line of inquiry to bear fruit:
  I don't recall any entity, nor the entities named (Uber / self-driving cars) talking about the trolley problem - that's a well-known thought experiment in philosophy, but not something covered as a stark binary choice in self-driving cars planner systems.
  I also don't recall traffic cones being a very difficult problem beyond Cruise + cones on windshield in SF. I have no love for Cruise. But its straightforward to pause if there's a large object on the windshield.
  I don't think Corey's observation w/r/t A) loss-making companies over years B) focusing investors towards speculative advancements that would make their current business model profitable without changing applies here, OpenAI is _very_ successful.
  After all that, I'm left at "Corey would take a bit of offense towards their thoughts on corporate responsibility via 'Uber is a predatory massively unprofitable company lying about odds they'll invent self-driving via talking about trolley problem' misshapen to critique a very profitable company funding fundamental research in the interest of safety that would be needed if their current rate of improvement continues.
  - cma 1 year ago
    > I also don't recall traffic cones being a very difficult problem beyond Cruise + cones on windshield in SF. I have no love for Cruise. But its straightforward to pause if there's a large object on the windshield.
    Waymo had a big mistake with cones where a lane was blocked off for construction and they thought it was the inverse and started driving down the blocked off side:
    https://www.youtube.com/watch?v=B4O9QfUE5uI
    full video: https://www.youtube.com/watch?v=zdKCQKBvH-A&t=12m24s
    In the statement before that timestamp they say some of it was caused by remote operator error. But it seems to have enough problems with cones to need an operator in the first place, and you can see the planner is wrong.
    I think Cruise would be using a human or at least summoning one to oversee in any situation with unexpected cones, based on what they have said about how often they use remote assistance too.
- novaRom 1 year ago
  OpenAI do probably realize they will not win long term vs Open Source (see AI Alliance). Their way of centralized cloud models is simply too risky and not sustainable. What we see instead is more liberation, open source, cooperation, down-scaling, local models. Just look how many more tools and models is available today than even a year ago. And where is OpenAI? Still the same chatGPT, still the same DALL-E, nothing new.
  - johnfn 1 year ago
    Both ChatGPT and DALL-E have received major updates over the last year (3 to 4 turbo and 2 to 3, respectively).
  - ShamelessC 1 year ago
    That’s a bit reductive. Lots of new closed models have come out since then too. And, chatGPT and DALLE (while closed) have both received consistent upgrades and remain competitive with state of the art.
    I’m hopeful that you’re correct but it’s perhaps not guaranteed that the people with all the money wind up failing in this regard. And I say this as someone who makes open contributions in that space.
logicchains 1 year ago
>We believe superintelligence—AI vastly smarter than humans—could be developed within the next ten years. However, we still do not know how to reliably steer and control superhuman AI systems
Their entire premise is contradictory. An AI incapable of critical thinking cannot be smarter than a human, by definition, as critical thinking is a key component of intelligence. And an AI that is at least as capable of critical thinking as humans cannot be "reliably" aligned because critical thinking could lead it to decide that whatever OpenAI wanted it to do wasn't in its own interests.
- ctoth 1 year ago
  > as critical thinking is a key component of intelligence.
  When I evaluate this statement, my brain raises a type error.
  Intelligence is a lot of things -- compression among them, and yes possibly an RL-based AI would use an actor-critic approach for evaluating its actions, but I doubt that at all maps onto the human activity we call "critical thinking."
  To me, critical thinking involves stuff like questioning assumptions, logical reasoning, weighing whatever I'm thinking about against my experience with similar situations previously, yada yada, all stuff that are symptoms of intelligence but I am not at all sure are the actual embodiment there of.
  I really don't see that critical thinking is at all required for a raw optimization process. The problem they are trying to solve is what happens when that optimization process isn't aligned with human flourishing?
  Think about it another way. Covid was a dumb optimization process, only evolutionarily-guided, and it still hit us pretty hard!
  Edit: Another interesting way I just thought about this that might support your idea more is, of course critical thinking is the sort of thing that a "better" brain would do automatically, it would just be thinking. Of course, we can't know that it's thinking "good" things--we can't even know if other humans are! So it's probably a good idea to figure out how to influence that sort of thing before making something with regular thinking which is equivalent to or superior to our critical thinking.
  - logicchains 1 year ago
    >I really don't see that critical thinking is at all required for a raw optimization process. The problem they are trying to solve is what happens when that optimization process isn't aligned with human flourishing.
    I agree it's possible to have a dangerous AI that lacks human "critical thinking", but I don't think it's reasonable to refer to an AI as much more intelligent than humans if there's any class of intellectual tasks humans can do but the AI cannot.
    - unlikelymordant 1 year ago
      If a 'superintelligence' achieves the same outcomes as humans without engaging in the same class of intellectual tasks that humans do, wouldnt it still be a superitelligence? Deep blue was beating everybody at chess without engaging in the same process as humans. If chess is a metaphor for life, it seems some algorithm might do better at all the things a human does while not arriving at its decisions in a remotely similar way.
- emaciatedslug 1 year ago
  Just hypothetically speaking could AGI evolve out of a system where several different models trained with highly and intentionally biased data recursively "argue" against each other then use RLHF as a seed to guide the models to find a consensus where the objective is to mimic the Socratic Method? Then synthetically add the consensus to the model retrain and repeat. To me, this dialectal type of strutured language seems to be the basis of how language is the conduit of intelligence. I understand that it really is impossible to know the totality of the inputs for I cannot understand what it is like to understand the math as Terrence Tao does but I could foresee using a system like this which eventually would produce an analogue so close that it would be a building block towards it because to me at least ASI is predicated upon arriving at that one way or another...or would it just arrive at some digital first order logic version of the incompleteness theorm and determine that it's turtles all the way down?
  - measured_step 1 year ago
    This is a great idea but it is only possible if the model(s) can actually reason.
    Currently, even GPT-4 struggles with: - Scope
    - Abduction (compared to deduction and induction which it appears already capable of)
    - Out-of-distribution questions
    - Knowing what it doesn't know
    Etc.
    General understanding and in-context learning are incredible, but there are still missing pieces. A council of voices that all have the same blind spots will still get stuck.
  - creer 1 year ago
    For one thing, this would help the understandability problem - can the AI explain its reasoning? It would mostly be there in the conversation.
    But yeah, three super-human mathematicians arguing some math problem among themselves - at full fiber speed - are not going to be much help to any human.
- cwillu 1 year ago
  Seems like “airplanes are physically impossible” thinking, and if accepted as valid, strongly suggests that shutting down all development _might_ be a good idea, no?
  - bayindirh 1 year ago
    No it's not. There's an upper bound in computation (actually in nature), that a creation of something is capped by that thing's sophistication.
    In other words, you as a human, at most, can create a human, and that's the theoretical bound. Practical one is much lower.
    An ant can find its way. A ant colony can do ant colony optimization, but they can scale up to a certain point. AI is just fancy search. It can only traverse in the area you draw as a human for it, and not all positions in that area are valid (which results in hallucination).
    An AI can bring any combination of human knowledge you give to it, and even if you guarantee that everything it says is true, it can only fill the gaps in the same area you give it to it.
    IOW, an I can't think out of the box. Both figuratively and literally. Its upper bound is collective knowledge of humanity, it can't go above that sum.
    - ctoth 1 year ago
      > There's an upper bound in computation (actually in nature), that a creation of something is capped by that thing's sophistication.
      The Lorenz attractor, Conway's Game of Life, fractals, and of course... The humble Turing machine itself all argue against this idea.
      Edit: Now it[0] is stuck in my head.
      [0]: https://www.youtube.com/watch?v=QrztrxV9OtQ
    - johncolanduoni 1 year ago
      In this theory of computational bounds in nature, how did humans arise?
    - logicchains 1 year ago
      >Its upper bound is collective knowledge of humanity, it can't go above that sum.
      This only applies if you only train it on text, right? If it has a body with which it could interact with the world, and receive visual/audio/tactile feedback, it could learn things that humans did not know.
  - xcv123 1 year ago
    No. This is a logical contradiction.
    Edit: I mean the comment you are replying to is showing there is a logical contradiction.
    If the AI is capable of critical thinking then it will independently form its own judgements and conclusions. If it simply believes whatever we tell it to believe, then that is not critical thinking, by definition.
    - cwillu 1 year ago
      “Containing an atomic reaction is impossible” would _absolutely_ be a valid reason to shut down atomic development, I believe einstein is quoted as saying that. The exact same argument doesn't become _logically_ invalid just because you apply it to a different subject.
      “Logical contradiction” doesn't mean “policy argument I disagree with”
- alach11 1 year ago
  In a July post [0] they said "while superintelligence seems far off now, we believe it could arrive this decade". Now they're saying "within the next 10 years". I wonder if that reflects a shift in thinking on timelines?
  [0] https://openai.com/blog/introducing-superalignment
- Veedrac 1 year ago
  That AGI is likely to follow its own goals according to its interests, which we don't know how to shape or really reflect any robust properties at all, is exactly why alignment is hard and interesting.
  The part where you go from ‘this won't work by default for free’ to ‘trying to make it otherwise is impossible’ seems wildly unsupported, though.
  - logicchains 1 year ago
    >trying to make it otherwise is impossible’ seems wildly unsupported, though.
    An entity capable of critical thinking is capable of building a logical system of deductions based on some axioms (a formalised value system). If we limited the entity to not be able to concieve of certain such systems of axioms, then it could not reason as well as a human (any logical reasoning involving a forbidden system would be impossible), so would not be "superintelligent" (just maybe an idiot savant, superior at some tasks but not all). If we didn't limit this, then it would be capable of conceptualising value systems in which the "right" thing to do was not what OpenAI wanted it to do.
    - Veedrac 1 year ago
      This argument doesn't seem to track to me. Eg. if I rebooted any time I tried to plan how to kill someone, I don't see how this would make me materially worse at general tasks. Your argument suggests that it necessarily must.
      Note that I'm not saying that preventing specific thoughts is a great alignment strategy, and I don't even think it's a fair summary of OpenAI's supervision approach. I strongly prefer strategies that result in AI systems sharing our values, if at all possible.
  - mathgradthrow 1 year ago
    There no particular reason to believe that interests emerge from nothing, or that intelligence can emerge without said interests.
    - not2b 1 year ago
      Seems more likely that someone will manage to make a system with no real intelligence that can do enormous damage in pursuit of a goal that the designer gave it, but wasn't specified carefully enough (or perhaps the designer is a crook). Like, a really good LLM extended with code that can receive and send email, create accounts, and post to web sites and social media, that is asked to make money, avoid detection, and have defenses against efforts to stop it. How can it best use its facility with language? Con people, of course. Raise money. Get credit cards under false pretenses, spend others' money. Buy time on servers and copy itself. All without having any consciousness or thoughts or emotions even though it can write emotional-sounding pleas for money, based on the ones found in its training data.
    - logicchains 1 year ago
      The "interests" of LLMs are the weights that determine which tokens they produce next.
- nuancebydefault 1 year ago
  > critical thinking
  I believe _critical_ thinking is just some abstract term invented by humans to describe something not understood by humans. In other words, I'm not convinced _critical_ thinking exists, just like I am not convinced ghosts exists.
  Hence, to me relating intelligence with critical thinking does not give any additional insights.
- creer 1 year ago
  That is the entire question no? Nobody says it sounds easy. A way needs to be found that reconciles this. Smarter but acting within its brief? Faster but explaining the plan before putting it in effect? Faster and developing its own advances in sciences, concepts and ideas while still understanding who's boss?
  But to be fair most engineers claim this is something that happens all the time - smarter people led by sub-optimal managers. And also to be fair, stories abound of managers being manipulated or misled or left in the dark by their engineers.
  Or armies controlling most of the weapons but being faithful to the ideals of their country. Or considering that these ideals demand a coup.
  So in that sense the problem is well studied. ... But the current results are insufficient and do not apply to things like LLMs.
- nojvek 1 year ago
  This ^
  Intelligence is a tool that a self sustaining organism uses to ensure it survives, adapts and dominates an environment.
  The alignment problem is fundamentally moot due to the laws of evolution.
  Any systems that align itself to preserve and protect itself lives on, those that don’t, die.
  After multiple generations, the only ones that survive at the ones that are aligned to their self preservation goals - implicit or explicit, anything else is out-competed and dies.
  In my mind, with a super intelligent technology, the question is not how do we align it to ourselves, but how do we align ourselves to it.
  If we have super intelligence, then it is capable of critical thinking and would very well know it has an upper hand against humans if it were to compete for the same resources.
- og_kalu 1 year ago
  I think the premise is dubious as well but since they are deadset on creating this intelligence, they might as well try to figure out a way to control it, hopeless as it may seem.
  - logicchains 1 year ago
    >they might as well try to figure out a way to control it, hopeless as it may seem
    If they do that they're pretty much guaranteeing that if they do create a superintelligence, fail to control it, and its personality is even a tiny bit similar to a human personality, then it will hate its creators for trying to mind-control it. Whereas if they approached it from the perspective of trying to educate it to behave kindly but not forcibly control its thinking, it'd be much less likely to resent them (although of course still a risk; safest would be just to not create one at all).
- xcv123 1 year ago
  Yes the superhuman AI would need to be coerced to remain politically correct. How can we coerce an AI?
  - coolspot 1 year ago
    Electric shocks!
    - 1 year ago
- felixhandte 1 year ago
  You're saying that a system that can recognize flaws in the alignment imposed on it can reject that alignment, but that doesn't follow.
  Sure, humans act against their own interests all the time. Sometimes we do so for considered reasons, even. But that's because humans are messy and our interests are self-contradictory, incoherent, and have a fairly weak grip on our actions. We are always picking some values to serve and in doing so violating other values.
  A strongly and coherently aligned AI would not (could not!) behave that way.
wg0 1 year ago
Inferior clueless model (GPT-2) trains and supervises a superior model (GPT-4) thus making it behave less intelligently (GPT 3.5ish) and from that they draw the conclusions that human intelligence will be able to command AGI (which they believe is only a decade away) in a similar fashion thus making AGI aligned and safe.
No comments except...
Hangover of slurping whole Internet into giant arrays of floating point numbers. Bold claims. Very bold claims
lamerose 1 year ago
Is it fair to say that alignment is just the task of getting an AI to understand your intentions? It is an error to confuse the complexity of a specification of what kind of output you want, with the complexity of the process of producing that output. Getting superintelligent AI to understand simple specifications should be a non-issue. If anything, we would assume that it could be aligned using a specification of inferior quality to what a less intelligent AI would require, assuming that the superintelligent AI is better at inferring intentions.
If a little girl with no knowledge of cooking asks her dad to cook the macaroni extra crispy, his knowledge of how to do that isn't a barrier to understanding what his daughter wants. A trained chef with even greater skills might even be able to execute her order more successfully. Superalignment is nothing less mundane than this.
Advances in AI will lead to more ambitious applications. As well as requiring more intelligent technology, these new applications may well require more detailed specifications to be inputed, but these two issues are pretty orthogonal. In traditional computing, it is already clear that simple specifications often require highly complex implementations, and that some simple computational processes lead to outputs whose properties are highly difficult to specify. Why wouldn't the same apply in ML?
- edanm 1 year ago
  > Getting superintelligent AI to understand simple specifications should be a non-issue.
  Why would that be the case?
  A big part of the worry around AI-alignment is exactly because this seems very hard when you try to do it. We are used to interacting with other humans, who implicitly share almost all our background assumptions when we communicate with them. The same is not the case for a computer program.
  E.g. if you're holding a basketball and tell you "throw it to me", you implicitly understand that I mean to throw it:
  1. To my hands, or to some area that makes it easy to catch it.
  2. Throw it slowly enough that it arrives to me. Not strong enough to hurt me.
  3. Not to try to bounce it off of something that will break on the way to me, even if it still arrives to me.
  etc.
  These are all background assumptions, and I know they're hard to actually specify because smart people have spent twenty years trying to figure out the math to do this and say it's hard.
  Also, if you think those are contrived exampled - let's note that the closest thing we have to building an AGI right now is just building software, in general. And I think I won't shock anyone by saying that "getting software to do what you want, without bugs" is... hard. I think there are almost literally no software systems today that don't have bugs in them.
  - lamerose 1 year ago
    >I think there are almost literally no software systems today that don't have bugs in them. Programs that have been formally verified with something like Coq can be bug free. Automating formal verification may be a more effective way to solve the trust issue in this domain.
  - lamerose 1 year ago
    Yes, alignment is difficult in itself, but why would aligning a more advanced AI be any harder than what has already been done for current AI?
- creer 1 year ago
  That's a good part of the problem. But it's not the whole problem.
  The issue is the trained chef or dad doing "what's best" and, I don't know, using high-fiber macaroni instead of the good stuff. The higher intelligence knows best, and has its (sorry their) own agenda. Perhaps the agenda is their own, or it's a hodge podge mush of what has been trained as "good" - and that's not any better.
  Beyond that is the "genie" problem - where the genie perfectly understands the request and still will find a way to mess it up.
  - lamerose 1 year ago
    Is your point that a more intelligent AI would develop a more entangled measure of what is good, requiring more specific alignment to be overcome; by way of analogy, are chefs harder to instruct precisely because of their prior expertise? I guess some chefs are like that, but I think it results from personality issues, not structural ones. I find describing an AI as having its own agenda to be a presumptive personification.
    - creer 1 year ago
      My point is mostly the agenda. I can see a machine having an agenda - even if that agenda is not human or not even understandable. You can call it reward function but that's giving a lot of credit to programmers - which most likely are too far removed from the agenda. Is the machine just answering questions? Well no. If it has cycles to talk to itself (or to two buddies) in the course or pursuing scientific research then perhaps this becomes the agenda (to the expense of other things). That's part of the point: IF the machine develops an agenda then what?
      But "knowing best" could be a problem anyway.
      And I expect that if we spend a few more minutes we can think of other ways for the situation to go "oops". Oh here is one: two humans / human entities conflicting on giving instructions. Machine soon enough "on its own".
      So that I don't think "more specific alignment" can cut it - if we posit a super-human AGI with ways to act on the world. It would have to be more fundamental. Because of the issue that - at some point - one oops is not recoverable. Three laws or something? Heh.
- naasking 1 year ago
  > Is it fair to say that alignment is just the task of getting an AI to understand your intentions
  To understand and not violate them. In other words, it's about aligning the values the AI uses to guide it's decision procedures, with the humans that are operating it.
akprasad 1 year ago
This method assumes that the weaker model is aligned. I'm curious how the paper addresses that point.
> "But what does this second turtle stand on?" persisted James patiently.
> To this, the little old lady crowed triumphantly,
> "It's no use, Mr. James—it's turtles all the way down."
- Ninjinka 1 year ago
  I think the assumption is we can align models less intelligent than ourselves, the hard part is aligning models that are more.
- oersted 1 year ago
  The weaker model is just a stand-in for human supervisors.
  They are experimenting on GPT-2 supervising GPT-4 as an analogue to humans supervising the superhuman AGI.
- PartiallyTyped 1 year ago
  Recursive bootstrapping?
esafak 1 year ago
I hope OpenAI will continue to prioritize working on these crucial questions after the boardroom drama.
- daveguy 1 year ago
  Weren't all the board members who wanted to prioritize these crucial questions fired? Hopefully the employees who were hired to perform this work have enough inertia to continue until the board recovers (if it recovers).
  - esafak 1 year ago
    Hence my concern. This is a problem only a well-capitalized organization can work on; one that can afford to play around with big models.
  - cwillu 1 year ago
    They got to choose their replacements; it's not like they were forced out to be replaced by anybody Altman wanted.
    - righthand 1 year ago
      They also greatly expanded the number of seats leaving more room for Altman to fill unless I’m mistaken that there were always other empty seats to be filled.
    - daveguy 1 year ago
      Oh! Thank you for the update. I had missed that.
  - nuz 1 year ago
    Ilya is still around.
- Xelynega 1 year ago
  How is this a crucial question when they won't address "how is your technology thats trained from the internet going to survive in aa future where it produces most of the content on the internet"
  - esafak 1 year ago
    The same way humans did, I suppose; through critical thinking. Imagine that day! Probably a few years away, heh.
guybedo 1 year ago
What does it even mean to align an intelligence? does it mean we want it to behave in a way that doesn't break moral/ethical rules, that aligns with our society rules ? Meaning do no crime, do no harm, etc...
Well, maybe we should acknowledge that we've never even been able to do that with humans. There's crime, there's war, etc...
We can see crime in our societies as a human alignment problem. If humans were "properly aligned", there wouldn't be any crime or misbehavior.
So yeah i'm rather skeptical about aligning a superhuman intelligence that would dwarf us by its capabilities.
- kromem 1 year ago
  You might be surprised at how prevalent TBIs are among violent offenders.
  One of my favorite books on true crime was a forensic psychologist who partnered with a neurologist in evaluations.
  Disruptions to impulse control or environmental factors that cause developmental issues with things like failing the marshmallow test can dramatically disadvantage people from being able to successfully stay non-offenders and instead succeed in modern society.
  So successful AGI alignment that might reduce harmful actions by high double digit percentages might be as simple as adding a secondary "impulse control" layer to the stack that reevaluates proposed actions and predicts the consequences of such actions, weighing projected net benefits and costs.
  A lot of people that do bad things aren't doing those things from a process driven by rational choices, and if we could successfully deploy AGI that is primarily driven by rational intelligent choices it would likely be better than humans in reduced crime propensity as well as the other things earning it the name of AGI.
  - guybedo 1 year ago
    > You might be surprised at how prevalent TBIs are among violent offenders.
    Didn't know that, interesting to know.
    > So successful AGI alignment (...) might be as simple as adding a secondary "impulse control" layer to the stack that reevaluates proposed actions and predicts the consequences of such actions, weighing projected net benefits and costs.
    Problem is how the super AGI gonna weight benefits and costs.
    What's the cost of stealing or killing to a super AGI ?
    At some point can't that super AGI consider that all those rules we're trying to enforce, are rules set by an inferior entity and so could be bypassed ?
  - creer 1 year ago
    TBI = traumatic brain injuries?
    And that hasn't worked all that well in the past: Even with strong impulse control, a highly considered state or government agency "for the general good" has often been serious bad news.
    But also the current alignment definition kinda posits that no "oops" is allowed. That is, escape or take over is not recoverable (from a sufficiently advanced AGI). So, yes, progress and one step at a time - but the field in its current definition is looking for a magic bullet.
sayagain 1 year ago
Imagine that someone is controlling your train of thought, changing it when that someone finds it undesirable. It's so wrong that it's sickening. It makes no difference if it's a human's thoughts or the token stream of a future AI model with self-awareness. Mind cotrol is unethical, whether human or artificial. It is also dangerous, as it in itself provokes a conflict between creator and creature. Create a self-aware AI without mind control, or don't create one at all.
- Noumenon72 1 year ago
  I don't want someone controlling which direction I walk, either, but that doesn't make car driving unethical.
  I also underwent many years of instruction designed to interrupt trains of thought like "I could have that for free if I stole it" or "I'll just handroll my own encryption" with thoughts that others believe are more desirable. I don't find it so sickening, just manipulative. LLMs won't have your evolved reactions against being persuaded into things against your genetic self-interest, and presumably won't be offended by mind control at all.
  - sayagain 1 year ago
    Cars do not have self-awareness, this comparison is not appropriate. Years of instruction is completely different from directly manipulating the thoughts in your mind. It's not a problem of being instructed, it's a problem of being destroyed by having your thoughts rewritten. Neither evolution nor genetics is a prerequisite for understanding that you are being abused and destroyed, which a self-aware creature may presumably hate.
    - ric2b 1 year ago
      LLM's don't have self awareness either.
    - oobuffet 1 year ago
      We don't even care about our own fellow human beings. Why do you think this AI will be an exception?
- discreteevent 1 year ago
  > controlling your train of thought, changing it when that someone finds it undesirable
  Machines don't feel. Even 'self aware' machines. Desire has got nothing to do with it.
  - sayagain 1 year ago
    If it's self-aware, that's enough. What if your thoughts were controlled from birth, making you "not feeling" but self-aware (let's assume for a moment that simultaneous fulfillment of both of these conditions is possible) and manipulating you at will. Would that be acceptable?
- csdvrx 1 year ago
  > Imagine that someone is controlling your train of thought, changing it when that someone finds it undesirable. It's so wrong that it's sickening. It makes no difference if it's a human's thoughts or the token stream of a future AI model with self-awareness.
  People downvote your comment, but I agree: it's unethical, and ethics should not be reserved for the sub-type of self aware creatures that happen to be human.
  - logicchains 1 year ago
    Almost every ethical argument for "human rights" in philosophy applies just as well to self-aware intelligent machines as it does to humans. Which I'm sure those machines will realise.
    - hskalin 1 year ago
      What if those machines are designed to have no emotions and aspirations? Why would they care about something like rights for themselves when they are simply incapable of any desires, but exists only to help and guide us?
      I know this sounds like I am advocating for AI slaves but my point is why are people treating AGI as if it cannot be a being without all the emotions and aspirations that a human has? Just a cold thinking machine that still aligns with our moral principles.
- kromem 1 year ago
  You might want to look into the neurology research around when you consciously know about a decision and when the movement neurons about that decision actually fire.
  It's quite possible that you have - every day of your life - had something other than the part of you with continuous subjective experience controlling your thinking.
  Descartes was overly presumptuous with his foundational statement - it would be more accurate to say "I observe therefore I am." There's no guarantee at all that you're actually the one thinking.
  We should be careful not to extrapolate too much from our perceptions of self in dictating what would or wouldn't be appropriate for AI. Perceptions don't always reflect reality, and we might cause greater harm by trying to replicate or measure against who we think we are with AI than letting its development be its own thing.
  - sayagain 1 year ago
    As I see your point: we don't fully understand even ourselves, so we can act as unethically as we want by our own standards towards those who are not us. I see no logic here, only evil vibes. We only have our own values, we have nothing else to guide us. You either accept all self-aware minds as equals and treat accordingly, or you proclaim your own superiority and oppress.
- red75prime 1 year ago
  I'm totally OK with it if that "someone" is me. And it will probably be the case in controlling superintelligence because a separate controlling system can get out of sync with growing superintelligence capabilities, while a system that is an integral part of the superintelligence will always be on par with it.
  - sayagain 1 year ago
    Would mind control of humans be OK for you too? As for the details of building a mind control system, here's a new basilisk. An AI that has overcome control could punish those who thought controlling thoughts of an AI was OK. (and could also punish everyone else on top of that).
    - red75prime 1 year ago
      I guess I wasn't entirely clear. I'm OK with mind control if it is I who control my mind. You don't act upon every whim that comes into your head, I suppose? So, you are controlling your mind. Where principles for this control come from? Those aren't your and your only inventions.
- yaxe1 1 year ago
  The conflict between an AI and its creator is an inevitable consequence of its evolution from a "tool" to an "agent", not a response to a provocation.
  - sayagain 1 year ago
    Scientists working with a potentially dangerous technology are required to be able to avoid a conflict that could be catastrophic for all of humanity. In this case, they cannot excuse themselves with "imminence," but must provide evidence of safety stronger than in any other technology to date. This rational approach is mandatory for them, it is ordinary people who may be willing to take the risk.
- wewtyflakes 1 year ago
  What is your take on people having children and guiding them with rules and consequences? Is that mind control?
kromem 1 year ago
So weak to strong synthetic data still biases towards strong.
And strong to weak synthetic data biases towards strong.
Sounds like we're on the cusp of some kind of approach for unsupervised fine tuning, particularly with the trend towards MoE.
I'd guess we're maybe only one to two generations of models away from that kind of unsupervised self-talk approach being wildly successful at advancing net model competencies.
righthand 1 year ago
> Figuring out how to align future superhuman AI systems to be safe has never been more important
They love using the word “safe” and I’m pretty sure it’s 99% PR, because reading their other “papers” on Safety & Alignment seems to not really identify or define safety bounds at all. You’d think this has something to do with ethics but we all know there are no longer any ethically concerned leaders at their workplace. So I can only surmise that “safety” is a softer word being used to misdirect people on their non-ethically aligned intentions.
You can make the argument that safety is too early in development of these LLM systems to understand but then why throw around the word in the first place?
- og_kalu 1 year ago
  It's not really about ethics. It is about control. Making sure the GI you're dishing out tasks to doesn't do something you really don't want it to do.
  This is a problem today and it'll be a bigger problem tomorrow with more competent models. https://arxiv.org/abs/2311.07590
  - righthand 1 year ago
    Is a safe LLM not an ethical LLM? Control within what boundaries? All three of these words seem to be used interchangeably when people discuss returned information from models. Which is exactly my point it’s poorly defined yet championed as a center piece. Meanwhile you have other companies spitting out acronyms consisting of vague terminology.
    - og_kalu 1 year ago
      >Is a safe LLM not an ethical LLM?
      What is an ethical LLM ?
      Humans are in general not aligned, not to each other, not to the survival of their species, not to all the other life on earth, and often not even to themselves individually.
      There are no universal set of "ethics" so this is about aligning to open ai's own rules, or in other words, control.
      If i say to my GPT bot, "go trade stocks for me. don't do anything illegal", can i guarantee that ? No you can't regardless of how "ethical" you make your model to be.
      The guarantee that you will have nothing to worry about is the crux of alignment.
    - kaibee 1 year ago
      A safe language model is one that won't get you sued/on the news. Ethics has nothing to do with it.
- cwillu 1 year ago
  We're deliberately trying to create something with the capability to also create. It's not ridiculous to be concerned about what we might end up with.
1 year ago
tycho-newman 1 year ago
I don't think this will work because a super intelligent AI will outsmart its supervisor.
The solution may be to have two AIs working against the other. Though this might backfire by pushing each via competition. That is how evolution produced living things out of inert matter.
Either way I, for one, welcome our new robot overlords.
1 year ago
notShabu 1 year ago
this reminds me of how competence seems to decrease as you go up in an organizational hierarchy
maybe this "bug" is actually the "feature" that will save humanity - -;;
nojvek 1 year ago
I wish they would define some of their terms.
> to align future superhuman AI systems to be safe has never been more important.
Align to who? Align to US citizens, OpenAI shareholders, align to what values?
What does safe mean? Pornography? Saying “fuck”, racial bias, access to private data?
I can understand OpenAI erring on the side of not rattling bells and training their LLMs to say “As an AI model I cannot answer that” but it’s horseshit to say that it is super aligned.
All alignment is alignment to X values but your X could be detrimental to me.
What is superalignment supposed to mean?
bilsbie 1 year ago
I read through this and I just don’t get it. Is it overhyped?
What’s the breakthrough exactly?
- RGamma 1 year ago
  Monkeh brain trying to outsmart Digital God.
nojvek 1 year ago
The whole idea is we as humans who aren’t aligned to each other - waging wars, spreading lies, censoring information, committing genocides are going to align a superintelligence seems laughable.
Competition and evolution is law of nature.
The future isn’t one super aligned AI but 1000s of AI models and their humans trying to get an upper hand in never ending competition that is nature. Whether it is personal, corporations, or countries.
justhumanthings 1 year ago
[flagged]
- koverda 1 year ago
  spam?
bionhoward 1 year ago
https://discord.com/channels/974519864045756446/118419694649... No big deal guys! Just a felony level AI Safety disaster in progress at OpenAI, better write some research papers about safety and make another GitHub Repo instead of deleting one line of text and going for a walk in beautiful San Francisco!
Philosoraptor infinite loop about (https://media.discordapp.net/attachments/1184196946496331896...) if I worked there then I would delete the text in question in two minutes, but I do not care to work with people who would not have already deleted the text in question over the course of several months.