You can't spell "Gell-Mann amnesia" without LLM

45 points by roryokane 1 year ago | 16 comments
  • NoraCodes 1 year ago
    Crichton amnesia - Dr. Gell-Mann doesn't enter into it - is a good way to describe how a lot of discourse around these systems goes. Some domain expert picks apart the outputs in their domain but imagines that it's just fine at style or history or what have you, and uses it for that.

    I see this especially with STEM experts attempting to automate away activities traditionally associated with humanities subjects.

    • pbw 1 year ago
      If you knew a human, who you could text day or night, on any topic, and they’d respond within seconds with a lengthy answer, which was often correct, would you delete their contact number because they sometimes made subtle errors? I would not.
      • swatcoder 1 year ago
        If I needed someone to answer questions for me at any time of day or night, and it didn't even matter if the answers were fully accurate as long as they arrived quickly, I'd wonder if I had an anxiety problem.
      • scotty79 1 year ago
        I think LLMs come off as educated, but not really smart enough for their vocabulary.
        • Lerc 1 year ago
          This seems to be the flimsiest of straw men.

          Of the things that the author states as generally agreed upon there is significant debate. I'm not even sure how much of it is considered as a prevalent opinion.

          Of the Gell-Mann issue at the end, I have never encountered such a statement, I don't doubt that it has happened here and there, but without supporting evidence to say this is a common occurrence, why bring it up in this manner.

          It seems the real intent of this post is to signal that the author does not like AI.

          I think there could have been a sensible informative article written about Gell-Mann amnesia warning how it also applies to AI output and suggesting people calibrate their expectations based upon the error rate in fields that they know well.

          • squigz 1 year ago
            > And yet I see people who should know better say things like, “I asked a conversational AI some questions that I knew the answers to, and its answers were well-written, but they had the kinds of subtle errors that could lead someone badly astray. But then I asked it some questions that I didn’t know the answers to, and it gave me really good, clear information! What a great learning tool!”

            I'd love to see an actual example of this

            • rsynnott 1 year ago
              I mean, everyone on this very website claiming that these things are good learning tools.
              • squigz 1 year ago
                It should be no problem to find examples if "everyone" thinks that, right?
            • jancsika 1 year ago
              Is "Gell-Mann amnesia" anything other than a guilt-by-association fallacy?

              As Crichton defined it, the person experiencing the effect doesn't even stay in the same section of the newspaper. So outside of small local newspapers, the journalist in the new section is almost always a different person, presumably reporting to a different editor.

              • roryokane 1 year ago
                Good point, but I think Crichton’s original example still makes some sense. One’s choice of newspaper could affect the trustworthiness of articles by multiple journalists:

                • Some articles may be edited by the same editor. If you see inaccuracies in an article by one journalist, you may worry that the editor is bad at fact-checking.

                • Even if every article is edited by a different editor, if the newspaper’s work environment is rushed such that editors are told not to spend more than two minutes thinking about accuracy, it might have more inaccuracies than a newspaper that allows more fact-checking time. (In other words, newspapers have their own publication standards.)

                • Newspapers set standards for what journalists to hire and retain. A newspaper that keeps paying for articles by a journalist who have been known to make mistakes is more likely to not care about mistakes they know of by their other journalists.

                Bringing this analogy back to LLMs: if you see one LLM make mistakes, you should be suspicious of the accuracy of a competing LLM to the extent that the competing LLM has similar “journalism” and “publication standards”.

              • com2kid 1 year ago
                > I thought we agreed that all of these “AI” systems are fundamentally just making shit up, and that if they happen to construct coherent sentences more often than your phone’s predictive-text keyboard then the difference is one of degree rather than kind. It’s amazing that the technology works as well as it does, but it’s been clear for a while now that these tools are unreliable, and that that unreliability is inherent.

                So are human brains! No shit!

                But you know what? I can throw recipes at ChatGPT and it can make some amazing variations! It is actually pretty good at making cocktails if you start off with some initial ideas. Sure it sucks at math, but it is still super useful!

                Oh and bug fixing. LLMs can go over dozens of files of code and find weird async bugs.

                Mocking out classes! I hate writing Jest mocks, and 100% of the time I get the syntax for proxying objects slightly wrong and spend (literal) hours debugging things. GPT is great at that.

                Summarizing meeting transcripts, wonderful!

                Or just throw an history entire book into Claude's absurdly large context window and start asking questions.

                • tmsh 1 year ago
                  I also dismissed LLMs given their "accuracy." But I think that's the wrong thing to compare. The sheer fact that a transformer architecture model gets better at scale and that we can feed it at scale is insane. It's what makes it AI and not just ML.

                  The fact that you can get billions of parameters to do anything useful from a relatively simple algorithm on a relatively small amount (high GBs / low TBs) of text means the algorithm is insane. That's what people miss - they think GPT is trained on "the whole internet" and is similar to some of low-variate regression model that is "approximating things." It is absolutely approximating things - so does all intelligence -- but it is truly sifting / "attending to" what is important over a relatively small corpus and organizing into billions of parameters the way a brain would organize data.

                  Will it hallucinate details? statistics? Etc.? Yes, and it should not be used in its current form for "truth." But it's very different from a low-variate model that is synthesizing in a low-dimensional space (which is how we gradually learn about the world) and an extremely-high dimensional model that is starting to see "what is important" in ways that are far, far above human intelligence. Similar to a human brain (due to the underlying neural architecture and any type of hierarchical compression of knowledge) but with far more input data, and a simplicity that maybe the brain has maybe it doesn't -- but is far more scalable and capable of hierarchies of information that out-scale us by so many orders of magnitude, and more every 6 months.

                  3blue1brown's https://www.youtube.com/watch?v=wjZofJX0v4M and upcoming videos I think will show the beauty and simplicity of the algorithm more. To put it another way -- the fact that you get a remotely true outcome with a model that just improves with scale, a remotely true outcome by the algorithm sifting what is important -- means that with time it will know what is more important in ways that far surpass humans.

                  If you approach interacting with LLM chatbots that way it is absolutely mind-blowing how "on point" the answers are. If you ask ChatGPT why the internet is important? Or why AI/ML models are important? Or why the "Attention is All You Need" paper is important? (yes with some RLHF but that's just to improve a few more percentage points). It will create an incredibly well-sifted, highly compressed answer* all from an algorithm that outputs matrix numbers from fairly limited, fairly shitty internet text, compressed into what is useful in a very eloquent way. That's the excitement of LLMs. Super-human intelligence from an algorithm and low-quality information.

                  * https://chat.openai.com/share/00a5f9b7-7ee1-4641-92bf-999185...

                  • dsr_ 1 year ago
                    ...and please give me half a billion dollars to.solve that problem, even though I have no proof of concept or even a good hypothesis.
                    • 1 year ago
                      • busyant 1 year ago
                        > aside from the considerable ethical concerns with the unauthorized scraping of everybody’s creative work,

                        If you want to make this argument, I'm "on board."

                        ===

                        > and the dismal treatment of the people who annotate that work, and the electricity it takes to compile those annotations into models, and the likelihood that companies will see this new technology as a cheaper alternative to their human employees

                        Agree as well.

                        ====

                        > those things aside, I thought we agreed that all of these “AI” systems are fundamentally just making shit up, and that if they happen to construct coherent sentences more often than your phone’s predictive-text keyboard then the difference is one of degree rather than kind.

                        But I'm tired of hearing this argument. I mean, if the LLMs work better and faster than the majority of _human_ assistants at my disposal, then who cares if they are "fundamentally just making shit up". They're better and faster than the competition, no matter how much you damn them with faint praise--end of story as far as I'm concerned with this argument.

                        • halosghost 1 year ago
                          > if the LLMs work better and faster than the majority of _human_ assistants at my disposal

                          You've added an additional condition to the if expression from what you quoted above, which would require substantiation for the conclusion to follow.

                          They certainly work /cheaper/ and /faster/ than alternatives, /better/ is an open question (and, at least in my experience so far, not even close to settled). However, companies are much more strongly incentivized to lower costs than to increase beneficial outputs (especially so in this case since human capital has become one of the most expensive costs for companies specifically /because/ it has been so hard to replace in “intellectual/creative” fields). The result is that they may (are?) replace human workers with sub-par alternatives as a cost-cutting measure.

                          I can imagine a (probably not steel, but hopefully more wood than straw) neoliberal capitalist might reply with something like “but if the result is subpar, then the market will respond by purchasing from alternative providers; and it will self-correct.” Unfortunately, if the entire market (read: a sufficiently large portion of the options in a given market segment) takes such actions together, the consumer has no choice, and the invisible hand shrugs as best it can.

                          All the best,

                          -HG

                          • rsynnott 1 year ago
                            > I can imagine a (probably not steel, but hopefully more wood than straw) neoliberal capitalist might reply with something like “but if the result is subpar, then the market will respond by purchasing from alternative providers; and it will self-correct.”

                            Eh, I don’t think that works. This only really works if the market _knows_ the result is subpar _in a timely fashion_. If you plough a few years and a hundred million VC cash into your robot dentist or tax advisor or whatever (real examples of things that people somehow think these things are appropriate for!) then by the time you realise that it’s totally unfit for purpose, it’s far too late.