Silkenweb Example: Hackernews Clone

AI Reflections

6 points by lonk11 2 years ago | 5 comments

ftxbro 2 years ago
> GPT is great until you realize 50% of its output is a totally manufactured confabulation with zero connection to reality.
GPT-4 might not be as smart as people but it's performing as well as humans on many kinds of AP tests, medical certification tests, bar exams, SATs, GREs, etc. You can see it on the tech report and the sparks of agi papers. I guess the author thinks those tests are bad, or he didn't read about those results, or he thinks those papers lied about the results, or he is being willfully misleading, I don't know which one.
I don't know of the author (maybe he is famous or not idk) but it looks like maybe he is promoting some competitor to GPTs "I’m proposing a new machine learning meta-architecture for learning forward models. The architecture is called Predictive Vision Model (PVM)."
- lsy 2 years ago
  Those tests don't have "construct validity" (i.e. they don't accurately test what they claim to test) with regards to LLMs. Word problems can be used to test human abilities because humans use words to represent interior understanding. LLMs have no interior understanding, so "performing well" on a set of word problems only tells us that the LLM's training set contains enough form to predict the answer, but not that the LLM has any concept of what the words in the question mean or how they might apply to novel situations in the real world.
  - moreice 2 years ago
    You can test this hypothesis by questioning GPT-4 until you hit a detail that it doesn't know. It will say something like "Without more context or a clearer reference to this detail in my training data, I can't provide a definitive answer.". Then you ask it to speculate about what the answer might be.
- IIAOPSW 2 years ago
  You should be more skeptical of their benchmarks as the people choosing the tests are also the ones trying to show their system passes. Their thumb is totally on the scale. Just look at the press release for GPT4, namely the graph showing where its improved most relative to 3.5. The largest margin of improvement was the uniform bar exam. Do you think that is a coincidence that happens by just agnostically feeding more data and doing more training without favoring anything in particular? Of course not. Law is a potential multi billion dollar annual revenue stream. The largest power of 2 less than 1031 is a multi dozen dollar annual revenue stream. Screw the goal of AGI, AI lawyer that can kinda do non-law stuff is what they can attain today!
  Any exam you can study for is a bad test of its ability to reason through novel situations. AP tests are not a great test for this purpose.
  Oh come on, how can a vision model be a competitor for a language model. That's a very tenuous leap of logic. I'll stop short of calling it motivated reasoning, but its certainly biased reasoning bordering on rationalizing.
sashank_1509 2 years ago
I used to agree with everything Piekniewski said in the past. And I can concede everything he said about Deep Learning right now and still point out that:
1. It appears that scaling these models will give us such high accuracies that it will solve the problem.
That’s just what I feel seeing ChatGPT and Meta’s Segment Anything