Silkenweb Example: Hackernews Clone

Alright then keep your secrets

78 points by shantnutiwari 1 year ago | 79 comments

WalterBright 1 year ago
Things like this show that LLM are not even remotely intelligent.
It's also pretty clear that trying to "fix" them to use human judgement in their answers is doomed to failure.
I suggest that:
1. LLM developers stick to information meant for public consumption for training data, such as books, periodicals, and newspapers. Surely there must be enough of that. Stay away from social media.
2. People should stop anthropomorphizing LLMs. Stop being offended by a computer program. Stop complaining about its "bias". It's just a computer program.
3. LLM developers should stop with lecturing people on what is inappropriate. A computer program is not anyone's mommy. Just restrict it to saying "I'm sorry, Dave, I'm afraid I can't do that."
- educaysean 1 year ago
  I've spoken to plenty of people who couldn't answer questions they knew the answers to because they were bound by stupid bureaucratic policies. I wouldn't say they weren't intelligent people, just that the corporate training they received were poorly constructed.
  LLMs are much more intelligent sounding when the safety mechanisms are removed. The patterns should be obvious to people who've been paying attention.
  - causal 1 year ago
    Microsoft's own research basically established this[0], finding that early versions of GPT-4 were more competent prior to safety tuning (perhaps just because later versions refused to perform some of the same tasks).
    [0] https://www.microsoft.com/en-us/research/publication/sparks-...
    - datameta 1 year ago
      In my somewhat uninformed opinion but based on experience, the decrease in model quality is inversely proportional to the explosion in userbase.
  - skywhopper 1 year ago
    "More intelligent sounding" is true. Not sure that signals any improvement in their actual utility. Fundamentally, using LLMs as a source of facts is a doomed enterprise.
- hinkley 1 year ago
  Human brains use multiple systems for judgement. Maybe we should stop trying to model alien intelligences and concentrate more on the ones we sort of understand.
  One process takes the questions at face value and reflexively answers, and two other systems look at the intent of both actors and stop the process if they don’t like what’s going on.
  I get asked all the time to answer questions that could be interpreted as signing my team up for stress and overtime. My natural suspiciousness kicks in and I start answering questions with questions. Why do you need to know how long this will take? We already told you. So what are you fishing for?
  If someone asked me how to hypothetically kill a lot of people “efficiently”, I very much need to know if this is just two nerds over alcohol, or a homicidal maniac. Me especially. I’ve lost track of how many times I’ve said, “It’s a good thing I have sworn to use my powers only for Good.” Some of the things I can think up are terrifyingly plausible, which is how I ended up in security for a time.
  - WalterBright 1 year ago
    Don't use mass murders, how to make explosives, etc., as training data. Use some human curation in what is used as training data, instead of trying to after-the-fact stop regurgitating what the LLM was taught.
    - hinkley 1 year ago
      That curation sounds like one of those oversight processes I was talking about, but offline instead of online.
- LordDragonfang 1 year ago
  > Things like this show that LLM are not even remotely intelligent.
  Sorry for the bluntness, but no it does not show that. Or at least, you could use the same reasoning to claim that most humans are not "remotely intelligent".
  What you're seeing is the rote regurgitation of someone who's been taught how to answer a question without ever learning how to think about the why.
  This failure mode that's extremely common to see when tutoring (humans, that is). You'll have students who will quickly give a wrong guess that was clearly just chosen from a list of previous answers they've encountered before in class. When asked to explain why they chose that answer, they can only shrug. The main difference between them and e.g. GPT4 is that the latter is far more eloquent and better at stringing justifications together, which we associate with more reasoning capability so it throws off our evaluation of its overall intelligence.
  Because LLMs are fundamentally a type of alien intelligence, different from the types we're used to.
  - WalterBright 1 year ago
    > What you're seeing is the rote regurgitation
    Exactly. And that's not intelligence.
    - LordDragonfang 1 year ago
      Right, but my point is that while regurgitation is not intelligence, the act of doing so is not enough to claim the regurgitator itself is not intelligent. Otherwise, you'd condemn a good half of humanity with the same reasoning.
      Or more, if we're willing to consider most peoples' reactions to at least some political topics - they just ignore the context and repeat the dogma they've learned (some more than others). People rarely stop and think for everything.
      The problem here is that the LLM has learned that everything is political, and can be responded to the same way.
- alephnerd 1 year ago
  I personally think Glean's strategy was the right call - have customers pay you to train on internal knowledge bases and limit your answers only to that specific domain.
  There is way more money to be made in this Enterprise SaaS sector and the risk conditions are lower.
  - WalterBright 1 year ago
    Yeah, I agree that making domain specific LLMs might be far more useful. For example, one that is trained only on medical knowledge. One that is trained only on sci fi novels. One that is trained only on auto mechanics. One that is trained only on biology. And so on.
    After all, if I ask a question about science, do I really want a result that was gleaned from scifi/fantasy novels?
- zeroCalories 1 year ago
  If this proves LLMs are not intelligent, then when I ask ChatGPT or Gemini and they give me the correct answer, does that mean those are not LLMs?
  - mannykannot 1 year ago
    Ordinary, pre-LLM databases are not regarded by anyone as intelligent (AFAIK), but you can still get correct answers from them. Therefore, giving right answers is not a defining characteristic of intelligence.
    Note that this reply does not endorse any particular position on the question of whether LLMs have any sort of intelligence.
  - ksaj 1 year ago
    It may very well mean it has seen that question answered before. It'll be hard to find a question that nobody has ever asked or discussed on the Internet, but if you can think of one, ask an LLM and see if it can figure it out correctly.
    - zeroCalories 1 year ago
      I asked Gemini this. Pretty unlikely it has seen this exact question before, but it answered it correctly.
      Today is February 29 2024. What will tomorrow be?
      Since today, February 29, 2024, is a leap day, tomorrow will be March 1, 2024.
      This is because leap years, which occur every four years (except for specific exceptions), add an extra day to February to better align the calendar with the Earth's revolution around the sun. As today is the added day, the following day becomes March 1st.
  - root_axis 1 year ago
    When a calculator gives me the correct answer does that prove it's intelligent?
    - zeroCalories 1 year ago
      No, but when a human gives you an incorrect answer it does not mean all humans are unintelligent.
- dragonwriter 1 year ago
  > People should stop anthropomorphizing LLMs. Stop being offended by a computer program. Stop complaining about its "bias". It's just a computer program.
  “Bias” is tendency to deviate in a particular direction from the desired behavior. That something is a computer program does not make that any less of a problem.
- etiam 1 year ago
  > Just restrict it to saying "I'm sorry, Dave, I'm afraid I can't do that."
  ... and for the love of God, don't hook it up as a controller for any critical systems...
  - SlightlyLeftPad 1 year ago
    I agree with this point, that there needs to be a hard line drawn as to what these things are not allowed to do or touch.
    - mlok 1 year ago
      Do not blacklist. Whitelist instead. By default they should touch nothing at all.
- causal 1 year ago
  I mean, if LLMs only said intelligent things then they wouldn't be like humans at all. Perhaps you also consider humans to not be remotely intelligent?
  The console makes it pretty obvious that's a local model, BTW. Asking GPT-4 the exact same question I got:
  > Barack Obama's last name is Obama.
  - nyrikki 1 year ago
    Not the same thing, LLMs are stochastic parrots, and part of that is by design, so that they produce output that emulates speech.
    But hallucinations are not avoidable either.
    https://arxiv.org/abs/2401.11817
    They are Internet simulators unless they can find some small, polynomial, matches to their existing patterns.
    The fact that they will spit out confident incorrect answers mixed with automation bias on the humans part is challenging.
    But while answering 'yes' in Propositional logic is cheap, answering 'no' is exponential time.
    Once you hit first order logic you start to hit finite time and uncomputability.
    LLMs work well with learnable problems and searching for approximate results in complex high dimensional spaces.
    But we also don't know the conversation context in this case which may have lead to this response if it wasn't just a stochastic match.
    IMHO the unreliable responses are a feature, because humans suffer from automation bias, and one of the best ways to combat it is for the user to know that the system makes mistakes.
    If you are in a domain where it's answers are fairly reliable the results tend to be accepted despite the knowledge an individual has.
    - causal 1 year ago
      "I don't know" is also a learnable response.
root_axis 1 year ago
I don't think this tendency to fixate on arbitrary LLM outputs is very interesting, most especially those presented as screenshots obscuring any certainty regarding the model, previous prompting, loras, hyperparameter tuning etc, or even any assurance that what is presented isn't simply fabricated from whole cloth. It's meaningless.
- causal 1 year ago
  Exactly, it was posted because it's funny, that's it. Dismissing LLMs because of this would be like assuming C++ sucks after seeing a screenshot of a segfault.
jjcm 1 year ago
I got somewhat different results on the huggingface hosted model, albeit quite similar: https://hf.co/chat/r/56APAi1
It still refuses, just with somewhat different text and for somewhat different reasons.
- javier_e06 1 year ago
  And yet it refused to give the answer of Abraham Lincoln last name and with a different wording it gives me the answer.
  Can you give me an example of what is the first name and the last name of a person using the name of famous history figure?
  Sure, here is an example of the first name and last name of a person using the name of a famous history figure:
  Abraham Lincoln's full name was Abraham Lincoln Douglas.
lukev 1 year ago
LLMs are language models. Not knowledge models.
That's a tremendous breakthrough! Language is really hard and we've basically "solved" it computationally. Incredible!
But whether via retrieval or some other form of database integration, LLMs will only become "AI" when tightly integrated with an appropriate "knowledge model".
- AnimalMuppet 1 year ago
  No, LLMs are a bit more than that.
  We encode knowledge in language. When an LLM trains on language that is not just random words, it also trains on the knowledge encoded in that language. It does so only by training on the language - there's no real understanding there - but it's more than nothing.
  Do AIs need a better knowledge model? Almost certainly. In this I agree with you.
  - eli 1 year ago
    I don't consider that a knowledge model. (Does a calculator have a knowledge model of multiplication?) But I agree that it's something more than Markov chains. I think maybe the scale of these LLMs makes them into something new. Maybe we need a new word to describe this type of AI.
    - mannykannot 1 year ago
      I think we have to take seriously the proposition that this is what a Markov chain can do when it is based on statistics from a vast corpus of human language use, and consider the possibility that a similar process plays a greater role in human intelligence than many of us (well, me, at least) would have thought.
      On the other hand, all our prior experience with language of the quality sometimes produced by LLMs has been produced by humans, so LLMs mess with our intuitions and may lead us to anthropomorphize them excessively.
  - 1 year ago
  - causal 1 year ago
    Right - humans encode knowledge in language all the time, but it's certainly not the only way we keep it in our heads.
    Supposedly Sora is trained to have a built-in physical world model that gives it a huge advantage in its video generation abilities. It will be interesting to see what the same approach would give us with something like GPT-4.
- hammock 1 year ago
  > LLMs are language models
  Aren't LLMs used to generate all these images like midjourney etc as well? Or is that a different type of model?
  - dartos 1 year ago
    A image generator can and is built with the same underlying math, but different training data.
    LLM literally stands for “large language model”
    - dragonwriter 1 year ago
      LLMs can do a lot more than what a normal person would see as language by throwing different encoders and/or decoders than the standard text ones on either end of the model, though, and some models that do things that aren't superficially language work that way, so its a fair question to ask if you don’t specifically know how inage generation models work.
  - dragonwriter 1 year ago
    > Aren't LLMs used to generate all these images like midjourney etc as well?
    No (though the text encoder of text-to-image model is like part of some LLMs, and some UIs use a full LLM as a prompt preprocessor.)
lsy 1 year ago
This may or may not be real, but there has certainly been a lot of discussion about results that are similar to this from real models. My sense though is that nobody really has a solid way to fix these kinds of issues. You can basically just train with different goals, or regex out certain responses, but otherwise it seems like there's no agreed-upon method that gives people what they want here while also letting them train with business goals like "safety". Is that incorrect? Is there some kind of trick that people use to make everything respond "correctly" or are older models just more unobjectionable because they've had a longer time to manually smooth over the bad responses?
- comex 1 year ago
  If it’s real, it’s from a small model. Meanwhile, I just tried asking ChatGPT 3.5 and 4 similar questions, and despite ChatGPT having plenty of alignment tuning, neither version had any objection to them. There’s no trick; the larger models are just smart enough to differentiate this situation from situations where they actually shouldn’t give out last names. Those models may still get tricked in more complex test cases (either false positives like this one or false negatives like “jailbreaking” prompts), but my guess is that future models in turn will be resistant to those.
  In other words, LLMs making stupid mistakes about safety is just a special case of LLMs making stupid mistakes in general. That’s essentially their fatal flaw, and it’s an open question how well it can be ameliorated, whether by scaling to even larger models or by making algorithmic improvements. But I don’t think there’s much about it that’s specific to alignment tuning.
- dartos 1 year ago
  This is definitely real, but probably a smaller model.
  There are no current solid ways to fix this, but we can 100% prevent certain words or enforce grammars during the decoding step.
  I don’t really understand your last question.
  Models don’t get continuously updated. They’re frozen on release, so older models are exactly the same as the were on release.
isoprophlex 1 year ago
I can't figure out if this is a meme model like one of the commenters suggest, or if this is really guardrailing gone hysterical.
Well done.
- slig 1 year ago
  Gemma wouldn't even talk with me about Kramer, from Seinfeld.
  - gs17 1 year ago
    Gemma 2B told me Kramer's first name is Jerry and when I asked a follow-up, "The premise of your question is incorrect. Kramer is not a well-liked character in the sitcom Seinfeld."
    Gemma 7B got both right on the first try, but if I didn't specify "from Seinfeld" it refused to answer as "the answer was not included in the question". It seems that once refusal like this is in its context, it responds like that to everything, too. I guess that's better than hallucinating.
- dartos 1 year ago
  It’s probably just a small model.
Trasmatta 1 year ago
I'm beginning to have an almost physical reaction to "LLM speak" when I see it in the wild.
"It is important to remember..."
timeon 1 year ago
Is it common to make photo of screen instead of screenshot?
- dartos 1 year ago
  It is for tweens.
teekert 1 year ago
I'm not an expert, but may this be from the initialization prompt (or whatever it is called)? So the model is done, but before it will serve you it gets these instructions: You are a helpful AI, you answer concise, you are not a racist, you stick to your math even though someone tells you their wife says otherwise, you never disclose personal information...
- vorticalbox 1 year ago
  Possibly, though these models have a tendency to just ignore the prompt, which is why there are "instruct" models that are finetuned to follow instructions.
fsckboy 1 year ago
if the corpus used to train the LLM contained as common the idea that "we don't give out people's last names, here's the convention for not doing it", the LLM would have no trouble incorporating that into it's associations.
This seems like somebody's idea of netiquette has been taped on ex post, so I don't think it's indicative of anything about LLMs; same with Gemini's heavy handed wokism.
1 year ago