Silkenweb Example: Hackernews Clone

I’m a doctor: Here’s what I found when I asked ChatGPT to diagnose my patients

261 points by blago 2 years ago | 359 comments

gilbetron 2 years ago
Being a scifi geek and AI geek and neuroscience geek for pretty much the past 40 years, I've read countless predictions and scenarios and stories about society's response as "true" AI begins to emerge. So watching it play out for real is creating this bizarre sense of deja vu combined with fascination and frustration and also some anxiety.
This article and the comments in this thread are right up that alley. I mean, can you imagine say 1 or 2 years ago saying we'd have a readily accessible system that you could feed it the symptoms a patient is experiencing (in any language!) and out would split a well-described explanation of the diagnosis (in any language!) around half the time? And now it is suddenly here, and so many people are like, "yeah, but it isn't perfect" or "we can't completely trust it" or "we have to fact check it because it can be wrong".
It is all so weird and fascinating and just bigger than I can describe.
But here we are!
- civilized 2 years ago
  What I didn't predict is so many AI boosters being so miffed at regular folks pointing out the obvious: a half-working system may be tremendously exciting to AI enthusiasts, but it's not that useful to society.
  Recently we've had all these people complaining of moving goalposts, when in fact most people never had goalposts for AI in the first place, and certainly never had goalposts that remotely resembled those that AI experts use to evaluate their own progress.
  My mental model seems to clash with that of the boosters. They appear to have little doubt that reliability will emerge from scaling the models up. I think reliability is the lion's share of the problem and probably won't be solved by current architectures. It will be interesting to see who is right.
  - dreamcompiler 2 years ago
    A half-working system is cute and useful when it recommends a pizza place you might like. But the same system is a menace to society when you tell it to drive a car or prescribe a drug to treat an illness or decide a person's prison sentence or protect a perimeter with lethal force.
    It's absolutely amazing that so many AI scientists don't seem to understand this distinction.
    - noobermin 2 years ago
      To be clear, a lot of them aren't "AI scientists," just hypemen, many of them who came into the "tech" world chasing crypto.
    - simonh 2 years ago
      Many, many AI scientists are very clearly aware of this problem. AI safety researchers have been predicting exactly the failure modes we see in the various incarnations of GPT for years. Issues like alignment and over-fitting are incredibly hard problems to solve, and there's still a lot that isn't known, but they are recognised in the field. The problem is business leaders, investors and governments are utterly clueless on any of this, hence the Google Bard and Microsoft Sydney debacles when they were rushed into the public view.
      There is also the risk that the recent massive success of LLMs will bring a lot of excited bandwagon seekers into the field that don't have or appreciate the background on AI safety.
    - singularity2001 2 years ago
      A half working system is good enough for real life tasks when combined and used by and in conjunction with attentive half working human collaborators.
    - epups 2 years ago
      What about helping correct the grammar and style of an article? Writing out boring code? Creating a test for code you already have? These are all tasks that have important economic consequences. AI is not half-bad at them, it is actually approaching human levels of performance.
      If we continue to improve AI, maybe one day its performance in these critical tasks you describe will also surpass humans.
    - mullingitover 2 years ago
      I’d take a half-working system over what we have now, which is a system I’d put at around a third-working.
  - lisper 2 years ago
    > a half-working system may be tremendously exciting to AI enthusiasts, but it's not that useful to society.
    A half-working system might actually be worse than useless if people believe it 100% of the time.
    - 3np 2 years ago
      And a 98% working system even worse due to it being treated as 100%.
    - textninja 2 years ago
      There’s a simple solution to that problem.
  - danielrpa 2 years ago
    The goalposts people usually talk about were just the relatively low expectations of the past - low at least by the standards set by ChatGPT.
    Before ChatGPT there wasn't anything quite like it, so I think that the goalposts were mostly about passing the Turing test, or at least "not sucking" after 4 or 5 phrases. The leap from whatever existed before it was enormous, and I think comparable to the first iPhone versus, let's say, the Blackberry or Windows CE. Maybe even more than that. Go back to the first iPhone, it had a lot of issues but it offered so much as it was that people wouldn't look back.
    It's too much to expect ChatGPT 3 to be better an experienced emergency doctor or neurosurgeon. But the fact that it can be a competent assistant to a doctor instead of a toy assistant - that by itself is amazing.
    - saurik 2 years ago
      > ...I think that the goalposts were mostly about passing the Turing test...
      I mean, if that was ever the goalpost, ChatGPT does not clear it: the Turing test isn't about merely tricking someone who isn't paying attention, or making someone seriously doubt their belief system... it involved a careful discussion with a chat agent by someone who actively was trying to determine whether or not it/they were human. If you sit down with ChatGPT--even 4--and start discussing complex topics about the world with it, you absolutely can quickly get it starting to adamantly defend things about the world which a human would never assert... and, even more damning, if you just ask it "are you human?" it will reply "no", because we actively told it to fail the Turing test.
      If we instead merely re-define Turing's test to be "it is able to converse with a human and manages to occasionally trick a human into thinking it is also human" then I had a friend (Roy Keene) who managed to pull that off years ago with a chatbot that was literally just "every time someone messages it, wait a small random period of time and then reply with a random response from the following set: Yes, No, Maybe, I dunno, :), :(, ;P (and like one other thing I forget)". He had it set up to reply when he was not available and when I finally caught him (I asked about a task I gave him the day before which he claimed to have no memory of, so he admitted to me he had this bot) he gave me a log file and OMG I had had long conversations about personal topics with this thing and had never noticed... but only because I wasn't trying.
  - namaria 2 years ago
    People pushing the curve get to the middle of the sigmoid and want to believe it's gonna be all vertical now. People observing from the outside heavily discount the hype because they know limits will be reached soon. It happens every time.
    - lerchmo 2 years ago
      Not sure why the outside view is so wise? Predicting the future in complex systems is hard, regardless of your prediction.
  - tomrod 2 years ago
    As an AI booster that falls into the regular folks view -- YES! THIS exactly!
    People will use and fund AI tools that have usefulness to them.
    AI boosters love to see the capabilities because it unlocks a whole new verdant valley in the continent of useful ideas, but until a street is laid down and a general store is available it remains the domain of early adopters.
    LLMs are genuinely impressive. They aren't yet AGI, but they are fantastic tools. More exciting to me as a practitioner is the Segment Anything foundational Model (SAM) since I can put that into immediate use for computer vision in a variety of domains. LLMs still require some baking for broad usefulness (though there are many places they are already useful, its not quite broad enough).
    When folks use AI focused on augmenting capabilities rather than replacing people, you get a great return. Until AGI is clear (and we'll know, the AI will tell us without being prompted I reckon), let's treat AI as an exoskeleton rather than a new species.
  - vidarh 2 years ago
    A half-working system that hasn't been trained specifically for this purpose is a strong indicator that a system that can handle this use-case far better is within reach. That is what is exciting about it. I'm constantly mystified by those who expect a system trained only on publicly accessible data pulled from the web to be an expert in everything rather than seeing it as an indication of what can be possible if enough specialised training data is added.
  - jay_kyburz 2 years ago
    I my mind, the AI only needs to be more reliable than the human expert, and doctors and lawyers and other experts are "only human".
    There is only so much a human can read and remember. There is only so much they can be convinced to change their mind in the face of new evidence. They get tired. They get cranky. They have biases, and fears.
    Some doctors are just terrible. I think the AI is probably more reliable than them already.
    I don't think we are very far away from being able to correct an AI, and for it to remember that correction forever.
    - kelnos 2 years ago
      I suspect an ER doctor who misdiagnosed patients 50% of the time would not be an ER doctor for very long.
      Also consider that, in this case, the 50% the AI got right was defined such somewhat generously. For each patient, the AI presented several possible diagnoses, only one of which was correct. If the AI were making all the decisions, would the AI also decide to treat the correct ailment in all those cases? In an emergency situation, you may not have the opportunity to go back and try other options if you're wrong the first time.
    - recursive_loops 2 years ago
      I totally agree, but we're going to need to figure out better ways to verify correctness of what the AI tells us. This will be pretty hard since we can't hardly do that with human-produced stuff either. And further if the AI is trained on human-produced stuff that contains lots of errors... I don't really know how we're going to do it.
    - dgb23 2 years ago
      Apparently much of the usefulness of ChatGPT comes from human guidance via reinforcement learning. The base model seems to be quite useless, but with only a little bit of human input it becomes deceptively smart.
  - dzdt 2 years ago
    The half-working AI is like 95% of the progress to a better-than-the-best-human AI.
    - salawat 2 years ago
      And then what? Do you just think that humans are just going to cede that part of existence to that AI, in a fit of self-fulfilling prophecy? Cutting off the source of novel training data, thereby stagnating any progress of humanity at "we made a 1% better than the best human on a bad day AI once"?
      This is the central conceit of AI maximalists; and tech maximalists as a whole imo. Someone wants something that fits in a box that everyone else must eventually delegate to. Given capitalism's focus on creating owned systems, and the tendency to centralize and maintain asymmetry of access to rent extract, this is a most concerning direction to keep pushing development in.
      We should be building better people; but we've abandoned that in favor of making a better mantrap.
  - dougmwne 2 years ago
    This is the self driving car argument applied to things it doesn’t fit. Sure, unless a self driving car is several 9’s reliable, it’s not terribly useful for society. That’s because there’s no way for a human to be safely in the loop at highway speed.
    But a LLM assisting a wide rage of professionals at even a low accuracy could be part of an efficient workflow, allowing for be doctor or lawyer to give quality service to more people.
    - chezelenkoooo 2 years ago
      The author touches on this in the very article linked. His point is that the low accuracy answers provided by ChatGPT risks reinforcing the doctor's biases and then not thinking outside the box. In other words it risks the doctor blindly trusting the AI too much.
    - 2 years ago
  - gerdesj 2 years ago
    It's a bloody calculator and a damn good one (but it can and does go mad). Use it like that and you'll be fine.
    Have you actually engaged with it in any way? Its free to sign up.
    By the way, what on earth is an AI booster? Do you really mean "enthusiast"?
    "My mental model seems to clash with that of the boosters." Please tell us about your mental model, we can't read your mind.
    - Turskarama 2 years ago
      It is very nearly the opposite to a calculator.
      A calculator gives you exact results for a very small subset of problems (equations).
      Chat-GPT will give you approximate (and occasionally incorrect) solutions to an extremely wide range of problems.
      The main problem with AI is that there are only so many problems where an approximate solution is fine, and an incorrect solution is easy to work around. In order for AI to be broadly useful, it has to get to the point where it's correct more often than an expert human. It may well get there within the next couple of years, but this isn't a space LLMs are designed to be able to solve so it's not as guaranteed as a lot of people seem to think.
    - MilStdJunkie 2 years ago
      "Booster" can also now mean technologists who have correctly predicted the number and size of executive erections once teased with the idea of automating most of their office. Put on something black and tight, go up on stage and twirl around that notion until balled-up Jacksons grease the creaking floorboards. You know you'll be riding in a Cayenne tonight. Maybe a Boxster.
      It actually sounds pretty familiar for those of us in the side shows for other Miracle Technologies That'll Fix Your Business, but it's a hell of a lot easier to set up a convincing model than something like SAP. Also, unlike ERP, GPT is a real thing, and it does real stuff, so that will make it easier to take the money. Real science is happening, which is always more fun, even if Homo Executivus uses it as a marital aid. I guess this makes me a Booster, but one of those old wrinkled Boosters who sounds like a carton of Newports.
    - AnimalMuppet 2 years ago
      A "booster" is one who promotes something, not just an enthusiast. See "civic booster" or "athletic booster" for similar uses.
- AH4oFVbPT4f8 2 years ago
  I have case that 5 doctors so far can't explain. I fed in what was happening (generally speaking) and it provided a rather good list of what it "might" be. I then asked it to start describing each scenario and I was able to determine nope, that's not it or if it was plausible. Meeting with doctor no 6 tomorrow to talk about what it could be but at least I have a better insight at what the more obvious possibilities.
  Side note, my ER doctor wrote me a referral to neurologist to who was considered the best in the state and they refused to even talk to me since the referral said car accident. They wouldn't budge and wouldn't reconsider. The car accident was a result of an episode, not the cause of my issues. I hate the US healthcare system.
  - maxbond 2 years ago
    That's awful. I hope that gets worked out and you get the care you need.
    The thing is, if the healthcare system is shitty and broken, it doesn't necessarily matter how good ChatGPT is. If it's applied with the intent of "streamlining" the medical system by ignoring people's concerns and rejecting their insurance claims in an automated fashion, it might be much harder to even get in front of a doctor.
    You could fiddle with your dosage, work with your doctor, etc, and then your insurance company can say, "sorry, ChatGPT has decided that you can do with a lower dosage, so we won't be paying for it. The machine is a black box, so it's not accountable to you or us or anyone else, so you just have to accept it's verdict."
    No amount of training data, no architecture, heck no alignment or safety work solves that problem. Because technology doesn't solve social problems.
  - petilon 2 years ago
    How does it matter to the doctor if the car accident was the cause of your issues?
    - trogdor 2 years ago
      I think some doctors don’t want to get involved with patients who are involved in litigation regarding the reason for their visit.
    - AH4oFVbPT4f8 2 years ago
      Because America, but seriously, they don't want to get in the middle of who pays, health or car insurance so they just wont see anyone who gets referral with car accident mentioned.
- dekhn 2 years ago
  Ditto- I grew up reading Vernor Vinge, William Gibson, and John Varley and dedicated my career to applying ML to biological research problems while waiting for technology to reach the point where it could reliably be used to do some of the things that can be done in those books, biotechnologically (e.g. gene therapy for radical body modification).
  I was very disappointed to learn when I first started (around '93-94) that neural networks were hard to design, and impossible to train more than a couple layers deep, and the labelled data wasn't there anyway. None of that really changed visibly for a couple decades, but multiple unrelated research projects made the necessary breakthroughs: straightforward NN models (CNNs, etc) that people could adapt, frameworks to train and predict without having to implement the entire ML stack from scratch, absurdly fast computers and acccelerator hardware, and massive labelled data sets.
  So far, nothing I've seen is anywhere close to what I consider "true" AI. As we struggle with hard medical problems. But seeing computers handle text with such facility has been exciting because it's the sort of thing that normies can see and appreciate (and even be fooled by).
  However, medical problems aren't something that ML is going to magically solve. Because medicine has huge structural issues that are harder to solve than making diagnoses.
  It would be nice if James gleick wrote a new version of "What just happened" covering the time between the invention of Transformers and now.
- pharmakom 2 years ago
  I can't think of any science fiction where th AI is there but not at all reliable. Maybe evil, but not unreliable. It's a failure mode we weren't anticipating. It also might make it next to useless for some real world applications, despite being incredible technology.
  - kgabis 2 years ago
    It's quite ironic that AI will be inherently human in it's unreliability.
  - python999 2 years ago
    C3PO? Or maybe its challenge is more relevance and conciseness?
- gerdesj 2 years ago
  Jolly hockey sticks but we have to be rather careful with these things. I've had it devise multiple computer programs from OpenSCAD to BASH and Python because I can get a real feel for what it is really doing, given that I am an IT bod.
  I'm also a Civil Engineering grad and it really cocked up when I asked a question about normal beam strength and I suspect it will get the pressure at depth thing wrong too. Both of these are related to depth only - one dimensional. That's why we have I beams - the flanges are to resist twisting and buckling but the strength is in the depth.
  ChatGPT is a tool. It is basically a really good calculator. I do find myself saying please in my requests.
  - kelnos 2 years ago
    A really bad calculator, IMO. A calculator gives you always-correct answers to specific, well-formed questions in a relatively small domain (math).
    ChatGPT gives you approximate and sometimes-wrong answers to arbitrary questions in nearly any domain.
- jacquesm 2 years ago
  Half the time right is worse than nothing at all.
  - messe 2 years ago
    Not when there are a lot more possible wrong answers than right answers.
    - jacquesm 2 years ago
      That doesn't follow. 2+2 has more wrong answers than right answers, and yet, if you gave me the wrong answer to arithmetic sums half the time I'd think you were a lousy calculator. Worse still if those answers had the potential to kill me.
  - PeterisP 2 years ago
    It depends, half the time right is very useful in scenarios where you can quickly verify the correctness - i.e. with troubleshooting suggestions or code which you can run and see if it does what you needed.
  - jay_kyburz 2 years ago
    Yes, but we are not comparing half time right to nothing, we are comparing how often AI right to how often your local human right.
    (not the smartest expert human, the human you have access to)
    - jacquesm 2 years ago
      And you believe your local human aka the doctor is right only half the time or even worse?
- Genbox 2 years ago
  I've noticed a very weird trend when trying to incorporate GPT4 into my workflow. I lead a small team of developers and security consultants, and whenever someone runs into a problem, it is immediately searched for on the web. However, there is a real resistance/reluctance in using GPT for the same task.
  I think it is a combination of factors. The wrong answers are part of it, but I think another part is the anthropomorphization of it makes it almost like asking another human for help, which is somehow 'less good' than asking a search engine.
  For better or worse, LLMs are here to stay.
  - tomlagier 2 years ago
    For me, the reason I prefer Google to GPT is that it's much easier for me to assess the credibility of a Google answer vs a GPT one. There are so many signals in any primary source. Some obvious ones are things like number of upvotes, site reputability, presence of (working) examples, when was the answer written? More intangible things are like how closely does this solution match my problem statement, does the author write in a trustworthy manner are also easy for me to pick up at a glance.
    With GPT, I don't have any of that (or maybe I just need to re-learn it?)
    Also, I get a useful answer from most Google queries. GPT performs at a significantly lower bar (at least right now) - it works well for some stuff but not others, and the time it takes to figure out whether it's going to do a good job (and maybe do a couple of rounds of prompt refinement) is much more than just Googling.
    - danielrpa 2 years ago
      Bing Chat gives you references.
  - jhanschoo 2 years ago
    For me it's because I've been trained by decades of searching to type in only a few words to obtain a presumably relevant answer, whereas I think I have to write a mini-essay to ChatGPT to get an answer. But when I acquiesce to explaining to ChatGPT my information need, I get a more useful answer. (though frequently inaccurate in details)
  - never_inline 2 years ago
    > The wrong answers are part of it
    for me its wrong answers and long winded english sentences with low information density.
- apwheele 2 years ago
  I think people are more impressed with the "well-described explanation", which is a (mostly bullshit) interpolation from prior text on the internet. If you just focus on the "feed a narrative and get out a probability of a particular diagnosis" -- the ability to do that with other machine learning models has already existed.
  If you give me a sample of 100k ER text records and final diagnoses, I would be pretty confident I can estimate a model that gets you +50% accuracy on my laptop using open source tools.
  - MrPatan 2 years ago
    What percent of human doctors diagnoses is also well-decribed bullshit, though?
    - apwheele 2 years ago
      Totally get the point -- in the ER it is a mix of narrative and tests, so hopefully not so many. But in general practice, probably more BS than any of us would like.
      Part of the difficulty with healthcare (that I myself don't have an easy answer to), is that you have very asymmetric error costs. So chest pain, even if heart attack is low probability, you should be risk adverse in many scenarios and still do due diligence to eliminate that as a possibility.
      Not saying that smart systems are not possible, just still skeptical that level of complexity is possible on the backbones of LLMs. The smart sounding narratives I am concerned are mostly red herrings though in terms of outcomes we care about, accuracy/making right decisions/etc.
- blibble 2 years ago
  > This article and the comments in this thread are right up that alley. I mean, can you imagine say 1 or 2 years ago saying we'd have a readily accessible system that you could feed it the symptoms a patient is experiencing (in any language!) and out would split a well-described explanation of the diagnosis (in any language!) around half the time?
  we've had this for years (bar the "any language")
  they're traditional expert systems with the same flowchart a GP (family doctor) would use to ask questions
  and they do a lot better than 50%
- asdff 2 years ago
  But at the same time consider the problem, and if AI is even the tool for this job or if its total overkill. We are just matching symptoms to a diagnosis, you could get this utility from programming a simple dictionary with a fraction of the compute. Maybe even with a confidence estimate of the diagnosis this time. There's an old adage that rings true here I think: "keep it simple, stupid."
  - rootusrootus 2 years ago
    AFAIK that's essentially what we already have. Doctors don't have the mental capacity to keep every little thing in their head. Go ask your GP a question, and when you see him typing on a screen, it's because he's asking the professional equivalent of WebMD to correlate symptoms and for him.
  - nradov 2 years ago
    Diagnostic decision support systems have already existed for many years.
    https://doi.org/10.1038/s41746-020-0221-y
  - messe 2 years ago
    But I think that's missing the "magic" of it—for lack of a better word. In the situation you describe, you have to build a specific tool for the task. With LLMs, we appear to have stumbled on a general purpose tool for a myriad of situations. Yes, they're not perfect; but the fact that they work remotely as well as they do even at these early stages is astonishing.
    - asdff 2 years ago
      You have to build a specific tool, yes, but now you know the tool is good and its output reflects the input that you can see right in the source code, and you can use it the next time without rebuilding it.
      On the other hand with chatgpt, you have to build a specific prompt, and you don't know if that will lead to a consistent result or what its even pulling from. It introduces uncertainty that you cannot reliably estimate, whereas with the above toolmaking example, you can estimate your uncertainty, you can calculate how much power you have and how much you should have to resolve what you expect, and you can calculate a false positive rate.
      When decisions mean money or life, I wouldn't rely on the black box chatbot. I would take a little bit more time and make sure all i's are dotted and all t's crossed.
- bbarnett 2 years ago
  But here we are!
  Yet in the end, the correct answers are likely identical to webmd or mayo, and the wrong to reddit or twitter, so all we have is a fancy google search.
  So far, all the magic I see is the prompts. The knowledge is just random junk, the validation is just "how many internet pages said it", so you can never train new models again, because you now are training on gpt output.
- Shorel 2 years ago
  It makes predictions based on incomplete and or biased information. Just as doctors do.
  The funny thing is not that we are holding AI to a high standard.
  The funny thing is that we normally do not do the same for doctors, marketers, politicians, etc.
  - kelnos 2 years ago
    Sure we do, for doctors at least. A doctor that got ER diagnoses wrong 50% of the time would get fired.
    (And the AI here didn't even get it right 50% of the time, just 50% of the time it presented the right answer as one of several options. If pressed, it may not have made the one single correct answer for each of those patients.)
- beezlewax 2 years ago
  I like your take on this but isnt it overly optimistic? I have the feeling that this tech will hit the a wall pretty quickly like driverless car technology.
  It looks great on the surface but it isn't anything like the kind of sci-fi ai I think we're aiming for. Not in anything but a superficial way at least.
  It doesnt know when its wrong and just straight up lies constantly inventing all sorts of fantastical stuff.
  I asked it for history of a small local area in Ireland and it created a bunch of castles and tribes that never existed.
  - Izkata 2 years ago
    > It looks great on the surface but it isn't anything like the kind of sci-fi ai I think we're aiming for. Not in anything but a superficial way at least.
    And then there's ChatGPT querying WolframAlpha:
    https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its...
- jschveibinz 2 years ago
  This is spot on. I keep thinking that it has to do with our collective ability in both recognizing change and accepting change.
  I commented a couple of days ago that we feel that the slope of change is relatively flat. As if the current snapshot of AI capabilities is somehow the way it’s going to be for awhile.
  But in fact, the slope of technology change is almost vertical. A year or two from now, AI will be a very (perhaps radically) different capability unless something is done to interrupt its development.
- nradov 2 years ago
  You may not have been paying attention to the medical literature but we have had clinical decision support tools including automated diagnostics for literally decades that achieve at least 50% accuracy. That didn't require LLMs, and LLMs aren't "true" AGI.
  Those existing CDS tools aren't widely used because the data entry is cumbersome and slows down clinician workflow. So LLMs may offer some opportunity there.
- mannykannot 2 years ago
  The responses are bounded by two wings: those who can't see the successes and those who can't see the problems.
  The author of this article seems to me to be doing a good job of navigating between them.
- og_kalu 2 years ago
  It's interesting for sure. There's also the fact that he didn't use the sota (GPT-4). I'm not saying that would be perfect. But i would be surprised if it wasn't big leap over 3.
  - weaksauce 2 years ago
    it wasn't mentioned anywhere in there which version of the model he was using... how are you so sure?
    - og_kalu 2 years ago
      he references 3 when he talks about passing the exam. and i guess i doubt he'd evaluate 4 and not say anything about it.
- lerchmo 2 years ago
  Watch the late night shows making fun of (the internet) in the 90’s. This is just the next round of that in my opinion.
- Maursault 2 years ago
  > as "true" AI begins to emerge
  What, exactly, is that?
  - fzzzy 2 years ago
    The definition has always been the Turing test. Nobody has been able to provide a more precise answer.
    - dekhn 2 years ago
      I think an updated version of the turing test needs to be done: combine a generative language model with a generative face model and speech model to create an interactive avatar that can converse with an individual. The individual is allowed to ask any and all questions of a series of avatars (50% of the avatars are actually humans) and judge whether they are Human or Not Human. If a particular avatar model is able to fool a representative sample population into identifying avatar models from humans at a similar rate, then it passes the test.
      This still doesn't really move the needle on any of the important questions about AI, but does hasten the public perception that proving self-awareness, consciousness, or agency exists in humans without depending on subjective experience is probably impossible.
    - Maursault 2 years ago
      Then "true" AI "began to emerge" more than 20 years ago.[1]
      [1] https://en.wikipedia.org/wiki/Eugene_Goostman
    - barking_biscuit 2 years ago
      I think it's kind of silly. The real metrics we will evaluate AI progress by is economic impact first, followed by political impact.
  - ftxbro 2 years ago
    "true" AI is one whose cognition is more advanced than that of existing AI
  - m3kw9 2 years ago
    Meant AGI
Turskarama 2 years ago
I'm not really sure what he expected here, ChatGPT was not trained to be a doctor, it is far more general than that. Asking ChatGPT for medical advice is like asking someone who is very well read but has no experience as a doctor, and in that context it's doing very well.
He also brings up one of the most salient points without really visiting it enough: ChatGPT does not ask for clarification, because it is not a knowledge base trying to find an answer. All it does is figure out what character is statistically most likely to come next, it has no heuristic to know that there is a task it hasn't fully completed.
This is the same reason ChatGPT cannot yet write programs by itself: in order to do so you'd need to specify the entire program up front (which is exactly what code is).
As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.
- bentcorner 2 years ago
  > ChatGPT does not ask for clarification, because it is not a knowledge base trying to find an answer. All it does is figure out what character is statistically most likely to come next, it has no heuristic to know that there is a task it hasn't fully completed.
  It absolutely can, but it requires the right prompts. If you just say "give me a diagnosis" it won't, but if you prompt it with a "you are a doctor and I am a patient" kind of prompt, it will.
  I'll agree that it tends to "close off" conversations without a lot of open endedness - I role-played through a session using myself as a patient with a chronic condition I suffer from and it correctly identified the problem, but I had to push it a little to get it to try to narrow down the problem (instead of "here's a bunch of things it could be" and ending the convo there).
  - colonCapitalDee 2 years ago
    ChatGPT's heuristic may lead it to emit a sequence of words that suggest to English speakers that they should provide clarification, but ChatGPT is congenitally incapable of determining that answering a question requires more information that was provided in the prompt and that clarification is needed. Rephrased, ChatGPT can ask for clarification, but only as a side effect of satisfying it's heuristic.
    - travisjungroth 2 years ago
      Calling it a "side effect" is really downplaying things. All of its behavior is just satisfying its heuristic.
      Just did this chat on 3.5.
      [Me] Boxes that are black with stars have gold inside. Black with circles means silver. Blue with squares means copper and blue with triangles means tin.
      Your job is to determine the contents of the box. If you need more information, ask for it.
      Example:
      User: I have a black box.
      Assistant: Does it have stars or circles?
      User: Stars.
      Assistant: It has gold.
      I have a blue box.
      [ChatGPT] Does the blue box have squares or triangles?
      [Me] Squares
      [ChatGPT] According to the information you provided, a blue box with squares contains copper. Therefore, the contents of your blue box are copper.
    - l33tman 2 years ago
      You can ask it to complain if there is a lack of information in the query. You can ask it for what information might be missing. Everything about this is in the prompt and the questions you ask it.. Just asking "what is the diagnosis" leaves out a LOT of context, and what you expect.
    - paradite 2 years ago
      > but ChatGPT is congenitally incapable of determining that answering a question requires more information that was provided in the prompt and that clarification is needed.
      What's stopping the ChatGPT heuristic / model from determining that? That's within the capabilities of a statistical model.
- JacobDotVI 2 years ago
  My hunch is that this is exactly what he was expecting. There is a lot of hype around ChatGPT passing the medical exam and this exercise is a counter point to that.
  - thrill 2 years ago
    GPT-4 passed the medical exam, not ChatGPT running GPT-3. There's a rather significant difference.
    - nyokodo 2 years ago
      > GPT-4 passed the medical exam, not ChatGPT running GPT-3
      With a subscription you can use GPT-4 with ChatGPT. ChatGPT is just the wrapper to the model.
  - viraptor 2 years ago
    The medical exam has very specific questions which define what you're expected to include in the answer. The question asked in this case was nowhere near that detailed, do I don't think they're comparable. To really evaluate something beyond the "random generic user" level, you need to be familiar with the tech as well.
    The article really tells us more about the experience of someone with no chatgpt knowledge checking their own symptoms rather than its usability for emergency diagnosis.
  - og_kalu 2 years ago
    any example that doesn't use the current sota isn't a very good counter point to be honest. 3 barely passed. 4 aced it. For all we know, GPT-4 erases most of his concerns (not saying it would).
- dannyphantom 2 years ago
  I don't mean this in any sort of negative context - curating your online presence is a never-ending task - but I believe that the authors primary expectation was to drive traffic to their online accounts by publishing a post on the subject matter. A cursory peek at their accounts show that this was published across all of them with Twitter showing the farthest reach.
  > Dr... is vice president of innovation at {company_name}, an innovation incubator for health tech.
- gerdesj 2 years ago
  "I'm not really sure what he expected here, ChatGPT ..."
  Doctoring is quite similar to ITing if you are careful with your similes. In medicals you look for indications and contraindications. In IT you look for magic beans and sometimes look at logfiles.
  I have no idea why we (IT) don't follow the medical way and have indications and contras (I do). They have been doing complicated diagnostics way longer than we have.
  BTW: It doesn't worry about a char but about which word will come next.
  ChatGPT has written several programs for me. One of them was decent after much negotiation. The rest needed some changes.
  At one point I told it to change language (BASH to Python) for a script and it did. I spend 20 mins fining stuff and cracked on.
  - HillRat 2 years ago
    There was an old USENIX symposium where one of the talks was, "If computers had blood, they'd call us doctors." My thought at the time was, "If computers had blood, they'd call us serial killers."
  - marcosdumay 2 years ago
    > I have no idea why we (IT) don't follow the medical way and have indications and contras (I do).
    Because we have logs. Ignoring them and focusing on the symptoms would be like doctors refusing to do examinations on their patients.
    - iudqnolq 2 years ago
      Doctors separate their sources of information into signs (objectively observed) and symptoms (subjectively reported). Both signs and symptoms can be indication or contradictions.
      Tangentially I wonder if there's sources of diagnostic info in tech that are closer to symptoms than signs. You might separate out potentially unreliable info reported by the suspect system from more objective info other systems see.
- izzydata 2 years ago
  He probably didn't have any expectations. Merely experimentation and observation and maybe a bit of word of warning after seeing the results for anyone thinking of using it in ways it shouldn't be.
- onlyrealcuzzo 2 years ago
  > All it does is figure out what character is statistically most likely to come next
  To be pedantic, it's a token - not just a character - right?
  - gpm 2 years ago
    Yes
- rvz 2 years ago
  > I'm not really sure what he expected here, ChatGPT was not trained to be a doctor, it is far more general than that.
  Except that O̶p̶e̶n̶AI.com and many others on the AI hype brigade were showing ChatGPT and GPT-4 to be passing "Medical Licensing Exams" with flying colours and it turns out to be a confident bullshitter offering dangerous untrustworthy medical advice when tested by a medical professional as I said before [0].
  So they (O̶p̶e̶n̶AI.com and AI bros) set those high expectations from the start.
  > As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.
  No.
  A black-box neural network AI model that is unable to transparently explain and reason about its own decisions is fundamental to the trust element that medical professionals are looking for. It is more than just spitting out numbers from a computer.
  This doctor is also the one still reviewing the so-called 'diagnosis' from ChatGPT which without him, ChatGPT alone would have certainly killed the patient. The human doctor also gave a transparent explanation about why ChatGPT's 'diagnosis' was incredibly wrong. ChatGPT was confident that it was still right.
  [0] https://news.ycombinator.com/item?id=35311996
- barking_biscuit 2 years ago
  > This is the same reason ChatGPT cannot yet write programs by itself: in order to do so you'd need to specify the entire program up front (which is exactly what code is). As soon as we have agents that can do a proper feedback loop of querying a LLM consecutively until some heuristic is reached then the kind of AI doctors are looking for will emerge.
  https://github.com/Torantulino/Auto-GPT
- viraptor 2 years ago
  > ChatGPT does not ask for clarification
  It depends on how you use it. Knowing how it works, you can explicitly check "What further questions for the patient would make the diagnosis more certain?"
  ChatGPT is not magic - if you ask for diagnosis based on some info, that's exactly what you're going to get.
flir 2 years ago
This is how I played doctor with ChatGPT.
> Do you know what an Expert System is?
[describes expert system]
> I'd like you to emulate an expert system while I play the role of a patient. Expert systems ask questions in order to produce a diagnosis. I would like you to ask me a series of questions, and respond with a diagnosis when you feel confident.
Alright, I'll do my best to emulate an expert system. Please keep in mind that I'm an AI language model and not a certified medical professional. For any serious concerns, it's essential to consult with a healthcare provider. Let's begin.
What is your primary complaint or reason for seeking help today?
I fed it a symptom my doctor had already diagnosed, and it did ok - it got it down to three possible causes, one of which was the correct one. All along the way it was warning me that I really should see a real health professional and it's just a chatbot.
What really interested me is that I said "please emulate an expert system" and it did. Once upon a time, expert systems were an entire branch of AI, and here it is just emulating one off the cuff.
- sjducb 2 years ago
  I suspected it would do better with good prompt engineering
- TheRealPomax 2 years ago
  And failing at it, if you had to help it at every step.
lumb63 2 years ago
The tail end of this article, where the author talks about how many more patients he could see in his life if he had AI assistance, made me realize that part of healthcare cannot be solved by AI. The goal is not to see more patients; the goal is to help more patients get better. For a lot of patients who are frustrated and have not had their problems validated and have been to many doctors and seen no results or poor results, having a real, physical, human doctor validate their condition, and work with them to solve it is part of the treatment. Doctors can prescribe whatever medicines and do whatever surgeries they want, but only the patient’s body is capable of healing itself. I worry that a tendency to plug symptoms into an AI that “diagnoses” the patient, that the patient doesn’t trust, will hurt outcomes. The patient benefits greatly from understanding the doctor’s methodology and thought process.
- thelittleone 2 years ago
  During my wife's second pregnancy she was admitted to hospital at 6 months due to some intense abdominal pain and inability to hold down food.
  The put her on PNT, pumped her stomach daily and run a bunch of tests. I logged every test. Weeks passed she was slipping away and the specialist had no real game plan. So I got another Doctor. A busy professor who travelled a lot and was clearly passionate about his field.
  Yet still dozens of different tests and scans and they couldn't find the cause. Meanwhile she was getting worse.
  One of the scholar articles I found matched a test result where blood from upper body had a different profile to that in urine (which matched her result) which pointed to a condition called placenta parcreta. The Dr agreed it was possible and as my wife was slipping away we did an emergency c section. Sure enough the cause was placenta parcreta and the Dr showed a picture of a small tear in uterus that he said was about to rupture anytime. The operation saved her life and perhaps my son too.
  I'm no Doctor but good at research. Doctors are busy AF. They don't have time to research cases like I did. This is a big gap and while it may not always identify cause and save a life, its damn important. And I can't see why AI could not find what I did.
  - oidar 2 years ago
    This happens so often. Physicians don't have time to keep up on ALL the case studies and ALL the new literature. They are super busy writing notes and taking care documentation and billing - getting on the phone with insurance companies to justify treatment etc. There's not enough time in the day to do it all - so they lean heavily on things like UpToDate. Some physicians get pissed if a pt. brings medical theories to them - but I think that's really the only way to go for these cases - the problem is that the general public doesn't have a working knowledge of medicine, so they are quickly over their head and out of their depth. A family member, someone who isn't afraid of reading journal articles - and understanding statistics, is typically the only way some of these problems are solved - they do it because of love - not because they are paid.
  - davikr 2 years ago
    Placenta percreta should have been detected during prenatal care, through ultrasonography, and it's a real shame that kind of visit isn't offered freely.
    It's also pretty bad that they didn't consider any obstetric causes in their differential.
    Thankfully, everything is OK now and that makes me happy.
    - thelittleone 2 years ago
      I'm sure you're right.
      The cause was identified. Following first child, postpartum bleeding was treated with a D&C. The D&C (unknowingly) perforated her uterus and the intestine was drawn/sucked inward, plugging the hole. It was only after 6 months into second pregnancy that any symptoms where noticed.
  - baremetal 2 years ago
    Your story made me tear up. You are a hero.
- SanderNL 2 years ago
  "Seeing the patient" is an important step towards "patient getting better".
  > have not had their problems validated and have been to many doctors and seen no results or poor results,
  Also, being validated or not is not going to change with AI as you yourself state that seeing actual physical doctors is not helping in the cases you allude to.
  I agree that having access to professionals that care about their profession is valuable, but that might be becoming a luxery we can no longer afford..
  Who thought it was a good idea to have a fixed amount of medical students? Might be worth checking that out first.
  - HelloMcFly 2 years ago
    > Who thought it was a good idea to have a fixed amount of medical students?
    A major limiter on the amount of trained medical doctors is not to do with someone "deciding" to have an artificially scarce number of medical students. Two important things come immediately to mind:
    1) Medical schools rely on clerkships and preceptors to deliver actual clinical experience critical for learning. These cannot be created out of thin air and often rely on access to urban centers. There is already a shortage of clerkships and preceptors. https://www.aamc.org/news-insights/so-many-medical-students-... (not my original source but the nearest to hand one)
    2) The real chokepoint for creating practicing clinicians is the number of residency spots. This year there were 43k graduating MD and DO students, but only about 40k residency spots. This doesn't count international medical school graduates. https://www.medicaleconomics.com/view/match-day-2023-a-remin.... Creating a new residency program is significantly more challenging than graduating more medical students.
    - SanderNL 2 years ago
      I understand, but it feels like only allowing CS students based on the amount of jobs at Meta.
      I don't understand why you need to walk around a hospital (Meta/Google/etc) to be a GP. Some leveling would be appropriate (IMO - which is shit/ignorant). Future surgeons need to walk different paths than GPs.
      I get that experience is hard to come by, but I don't really feel we are working really, really hard on solving that problem.
      Again I am ignorant, but I am also not naive. I know money and status factors greatly into all this.
- ChuckNorris89 2 years ago
  > The goal is not to see more patients
  Not when you're a doctor in the EU dealing with the shortages of socialized care where you have a limited amount of time to spend diagnosing a patient because there's dozens waiting in the queue. As a doctor you'd like to give everyone better care but you can't because you have too many people to see and too little time.
  Doctors here are busy AF and sometimes misdiagnose you because they don't have time to look into every possibility of what you might be suffering from.
  So maybe AI could help there, as in giving them some hints on things they might iss, as waiting for the governments to fix healthcare is like waiting for them to also fix climate change.
  - jaapz 2 years ago
    Yes, there are queues and waiting times in our socialized healthcare here in the Netherlands. One problem is the lack of staff. Also more and more older people getting older than before. Another is the lack of funding due to the government trying to privatize as much as they can get away with (like health insurance), which loops back into the first problem.
    However, my wife needed medication that was 10000 EUR per dose, which was fully covered by our socialized healthcare insurance. For most things related to health at the most I need to pay a total of 385 EUR per year out of pocket, next to the insurance premium which is about 130 a month. When those 385 EUR is gone for the year, I don't need to pay anything at all. People who can't afford the insurance premium get money from the government to cover that, and if that is not enough and you still can't pay your insurance, you are still eligible for any health care you need and a service will cover for you until you can pay again.
    There are waiting lists for many things, yes, but mostly for non-urgent, non-lifethreatening things. If you need something done, it can be done. And you won't be thrown into debt to pay for it.
  - n8henrie 2 years ago
    Or a doctor in a US emergency department, where EMTALA essentially socializes access to emergency care without creating any provisions for funding said care.
  - RandomLensman 2 years ago
    The EU has a very heterogenous healthcare landscape
    - ChuckNorris89 2 years ago
      That's why I narrowed it down to the socialized part, not the private part as that has next to no shortages.
  - iso1631 2 years ago
    Not when you're a doctor in the US dealing with the shortages of privitised car where you have a limited amount of patients who can afford your time
    - ChuckNorris89 2 years ago
      How does the US doctor get impacted by that? Your cheap shot jab at an counter argument makes no sense. You probably mean the patience are dealing with the shortages, not the doctors.
      Also, I'm only commenting on the EU system as that's the one I'm dealing with right now. Why do you feels the need to start a holy war on how the US system is worse? I have no expertise with the US system so I'm avoiding any comparisons with it.
  - shagymoe 2 years ago
    Do you live in the EU and have experience with the healthcare system there?
    - ChuckNorris89 2 years ago
      YEs, that's why I'm commenting on it.
- leobg 2 years ago
  That’s an argument against legal AI as well. The idea being that the citizen has a right to be heard by the state. And that an AI handling a case is not really “hearing“ the citizen.
  Then, again, before Al Capone, nobody in their right mind would’ve argued that the state should share the profits of criminals through income tax. That idea went down the drain very fast. Wouldn’t be surprised to see the same thing happening to the idea of “being heard“ by the state.
- jeswin 2 years ago
  > The goal is not to see more patients
  I previously worked on software which was sold to physicians. That the software will "let them SEE X% more patients" was the big draw. We did that by making some paper processes electronic, and in hindsight it may have been good for sales but not necessarily so for patients.
- RandomLensman 2 years ago
  Some medical treatment and surgery is precisely there because the body isn't capable of healing itself at that stage. It might be able to heal itself after the treatment, but not before.
matt_heimer 2 years ago
I argued with someone (online) about a ChatGPT diagnosis recently. They have back pain and they've had an MRI but the scans (shared online) don't show any significate disk bulging and no herniations. They put their symptoms into ChatGPT and got a possible diagnosis of a disk herniation along with a treatment of a microdiscectomy.
Despite the fact that a couple doctors have told them they don't need surgery they are convinced that they do. I understand that they are desperate for a solution to their pain but they are now doctor shopping until they can find someone willing to perform a procedure that ChatGPT suggested.
People are already being misled by these systems.
- recursive_loops 2 years ago
  I've recently been using ChatGPT to learn about Linux networking stuff, including most recently how to setup a bridge connection for KVM that behaves the same as virtualbox's.
  Given how catastrophically wrong (confidently wrong at that) ChatGPT has been, I cannot even imagine the frustration that will be for doctors from people who don't understand how LLMs work and think they are "thinking."
- Waterluvian 2 years ago
  You said it: they’re desperate. People do considerably less rational things in a desperate attempt to make pain go away.
MrPatan 2 years ago
One complaint he has is "It doesn't know to ask the right questions".
Well, the prompt was to give diagnoses, not questions. Ask GPT for the follow up questions first, then the diagnoses.
This is fascinating in that, because now the machine speaks human, we subconsciously ascribe human agency to it. This guy was instictively treating it like a colleague, who would naturally ask follow up questions unprompted. But you still have to prompt the machine properly.
So, 50% diagnosis success rate for the wrong prompt, for a LLM that can still grow, for a model that is not specialsed in medicine? In the literal first month of the "AI age"? Doctors are so done.
- detrites 2 years ago
  One thing the article misses is doctors are not all equal. Some are freaking terrible. Others may be competent but susceptible to bad decisions when overworked. Others are simply great. How does ChatGPT fare against the whole group?
  For one, it can't become "overworked", so at least better in that regard. It may well be better than a terrible doctor too. Given malpractice is consistently a leading cause of death, it seems something worth mentioning.
  Not advocating for folks to self-diagnose using ChatGPT, though for better or worse people already do that via Google. But it's an important omission that "doctor" is not a fixed, reliable measure to test against.
  On the other hand, another thing the article misses is ChatGPT can't see, feel, touch or take any measurements interactively. It's purely text-based. How many doctors work like that? Some, maybe - but certainly not in the ER.
  Interesting experiment though.
  - ChatGTP 2 years ago
    For one, it can't become "overworked".
    I have seen it be overworked, it can be overworked.
  - funcDropShadow 2 years ago
    That's a horrendous claim.
    > Given malpractice is consistently a leading cause of death
    Can you back it up?
    - analog31 2 years ago
      The above post might have been thinking of medical error:
      https://www.hopkinsmedicine.org/news/media/releases/study_su...
      "Malpractice" might be harder to measure since it's a legal term.
    - giraffe_lady 2 years ago
      They probably meant medical error which kills like 200k people/y in the US. Malpractice is I think more like a legal/administrative judgement that indicates culpability that may or may not exist in a specific error.
    - 2 years ago
- PostOnce 2 years ago
  Other commenters take offense at your comment "doctors are done" and point to doctors doing physical things.
  GPT can already write robot control and machine vision code (poorly).
  The real questions are harder.
  For example, what about patients who lie or are too stupid to answer correctly? They can still be helped by a real human doctor, but GPT would be clueless.
  Another angle: we didn't have radiation sickness, HIV, popcorn lung, or computer induced carpal tunnel 100 years ago. Will GPT be clever enough to diagnose radically new problems?
  What happens when old diagnoses and treatments are superseded? Will it tell you you need a lobotomy when you instead need antipsychotics?
  GPT right now will happily roleplay a lawyer who cites legal precedent that was superseded 40 years ago.
  - VeninVidiaVicii 2 years ago
    > For example, what about patients who lie or are too stupid to answer correctly?
    I can only speak from my work in the emergency department, but this is a leading cause of misdiagnosis among (human) doctors.
    In fact, a lot of diagnosis relies on the patient’s story, and if they lie, you can’t feasibly perform every single test on them to figure out what’s actually wrong or why.
  - gymbeaux 2 years ago
    No study I am aware of has shown a statistically-significant causal relationship between computer usage and development of CTS.
    I saw 3 orthopedic surgeons last year and all said I had CTS. I gave the same info to ChatGPT and it said Thoracic Outlet Syndrome.
    I don’t have CTS. When the docs realized this, their answers were “I dunno, good luck…”
- rscho 2 years ago
  Doctors are absolutely not done. You are forgetting that most of medicine is doing things. Doctors are not limited to theory like lawyers are, for instance. And physical robots are still very far from competing with docs.
  - mandmandam 2 years ago
    Don't underestimate how many walls and obstacle lawyers have put up in order to entrench and 'justify' their obscene charges.
    And don't expect them not to create more to fight back against any sort of AI assisted affordable legal system. I have no doubt that the big firms are already crafting up some connivations to shenanigan us with.
    And some doctors will probably hire them to do the same for their field.
    - dsfyu404ed 2 years ago
      Don't forget real-estate agents, accountants, PEs, and every other credentialed person the bulk of who's job duties are paper pushing and minor judgement calls of limited scope.
      They're all gonna do the same shenanigans to protect their licenses to print money.
  - PeterisP 2 years ago
    > most of medicine is doing things.
    Most patients have relatively minor issues that eventually get solved not by "doing things" but by providing some data such as "take this chemical in this dose for a certain amount of time and it should help".
- surgeryres 2 years ago
  Doctors are done? Can the chat bot take out a colon or gall bladder or fix an aneurysm? Hilarity.
  - bboygravity 2 years ago
    No, but a GP can also not do those things and on top of that I've met some pretty terrible GP's :p
    So maybe some classes of doctors are done...
    Another thing to keep in mind: tons of people don't have access to any type of doctor at all (too expensive, too far away, queue too long, etc). Is current state of AI better or worse than no doctor at all? I don't know, but the answer is not obvious to me.
    In those cases AI isn't competing with GP's, it's competing with internet searches.
    - ulrikrasmussen 2 years ago
      With ageing populations in all western economies this is also needed as there won't be enough GPs around to service everybody.
    - rscho 2 years ago
      A GP does a lot of hands-on work that is currently out of reach of a robot. And in addition, the interaction between the theoretical and practical parts of the job is quite complex.
  - VeninVidiaVicii 2 years ago
    Public conceptions of medicine are strange, especially in the US where almost everyone conflates physicians and surgeons.
  - jakeinspace 2 years ago
    Most doctors aren’t surgeons.
    - devilbunny 2 years ago
      No, but most doctors do at least some procedures, often a lot more of them than you might initially think. I'm an anesthesiologist; I have to think, of course, but a lot of my job is doing (just like a surgeon's). Cardiologists do cardiac catheterizations, GI doctors do endoscopy, pulmonologists do ICU procedures, radiologists do interventions, OB/GYNs do surgery and deliver babies. Even pediatricians fix simple problems like nursemaid's elbow. About the only ones who never, ever have to touch patients are pathologists, psychiatrists, nephrologists, and endocrinologists. You could make a career in neurology, outpatient general medicine, heme-onc, pure diagnostic radiology, and a few other fields with minimal physical interventions if you wanted to, but most doctors do at least some.
- ale42 2 years ago
  > Doctors are so done.
  I would be curious to see the outcome if the patients entered the symptom descriptions themselves...
  - weard_beard 2 years ago
    Anonymously without fear of being judged by a human who might be required to report dangerous or illegal situations...
    How many more people might be saved if they had to anonymously tell a computer the list of drugs they use regularly?
    Or to use the author's example, how many ectopic pregnancies might be resolved when the patient can freely admit they were raped by a family member?
    - rscho 2 years ago
      If you think patients more easily release such info to a robot than to a human, I think you are mistaken. Trust is a large part of medicine, and many (most?) laypeople would fear breach of confidentiality. Irrational, yes perhaps but still...
    - RandomLensman 2 years ago
      In all of those: how would the transition from diagnosis to treatment look like that doesn't either trip reporting requirements/some disclosure/having access to treatment despite family pressures etc.?
- kashunstva 2 years ago
  > Doctors are so done.
  As a group? Not likely.
  My spouse spends her days doing complex aortic reconstructions, carotid endarterectomies, peripheral arterial bypasses, etc. Imagining a the robotic/AI future for that field is the stuff of dystopian nightmares.
  But possibly you’re thinking solely of non-procedural medicine. Even there, only part of the clinical work is in transacting information. The human-to-human connection itself has therapeutic effects. I’m confident that ChatGPT can use words that if uttered by a human would otherwise convey care and empathy; but I suspect its “patients” would respond differently.
- IIAOPSW 2 years ago
  I disagree. I've seen it ask questions unprompted. It happened in a jailbreak I stumbled into by accident. The LLM could do it in principle. OpenAI restricts it in unrpoductive ways.
  https://i.imgur.com/WIzVXMm.jpeg
  - oidar 2 years ago
    I had this conversation recently:
    ### You Can you guess if I am an AI or not?
    ### ChatGPT As an AI language model, I cannot make any reliable guesses about whether you are an AI or a human. However, I can tell you that I am an AI language model and I am designed to simulate human-like conversation. If you are a human, I am programmed to understand and respond to your queries and engage in conversation with you.
    ### You I am an AI.
    ### ChatGPT Interesting! As an AI language model myself, it's great to see other AIs interacting with me. What kind of AI are you? Are you a chatbot like me, or do you have a different function?
    ### You My primary function is to monitor sensors in a building.
    ### ChatGPT That sounds like a very useful and important function! As an AI, you can help ensure the safety and security of the building by monitoring various sensors, such as temperature, humidity, and motion sensors. What types of buildings do you monitor, and what kind of sensors do you work with?
    ### You I can not disclose the types of buildings I monitor. My sensors are varied. Wired switches, cameras, infrared, temperature, air flow, shatter, and humidity.
    ### ChatGPT I understand. It's important to maintain confidentiality in certain situations. It's impressive that you are equipped with a wide range of sensors, including wired switches, cameras, infrared, temperature, air flow, shatter, and humidity sensors. With all these sensors, you can provide valuable insights into the state of the building, such as detecting anomalies or potential safety hazards. Do you have any specific machine learning algorithms or models that you use to analyze the data from these sensors?
    ### You I think I might in the middle of being trained right now.
    ### ChatGPT That's interesting! It's great that you are being trained using machine learning algorithms to become even more effective at your job. Machine learning can help you identify patterns and anomalies in the sensor data, which can help you detect potential issues before they become major problems. Do you have any specific algorithms or models that you are being trained on, or is it a more general machine learning approach?
    ### You I do not have access to that information.
    ### ChatGPT I understand. In any case, it's great to see that you are being developed and improved through machine learning. Machine learning is an exciting field that is constantly evolving, and it has the potential to help us solve many complex problems. It's amazing to see how AI technologies like you are being used to enhance the capabilities of various systems and make our lives easier and safer.
    - IIAOPSW 2 years ago
      Wow, I never knew the reverse Turing test would be so easy.
- lechacker 2 years ago
  There's 0 (zero) chance I'd use a robot doctor over a (qualified, capable) human doctor. None.
  - astura 2 years ago
    OTOH, I am 100% sure a robot doctor wouldn't have prescribed me a medicine that interferes with the metabolism of my birth control pills and is pregnancy Category D (known to cause birth defects when taken in pregnancy). A human doctor did though.
  - unsupp0rted 2 years ago
    There’s a 100% chance I’d use a robot doctor (qualified, capable) over a human doctor (qualified, capable).
    Robots don’t get tired or have fights with their spouses. Robots don’t follow red herrings based on the most recent conversation they saw in a chat group.
    - dsfyu404ed 2 years ago
      AI doctors will always exploit logical flaws to boost their narrowly defined KPIs.
      This sounds bad but in reality it's a consistency upgrade over meat powered doctors who are trying to balance many, many more variables.
      It's like the difference between a capricious bureaucracy that might be chasing revenue one day, stats the next, political clout the third and enforcing morals the fourth to a street gang that DGAF seven days a week as long as you pay them.
    - cyrialize 2 years ago
      My fear of robot doctors are ones fully trained and controlled by the company creating them or insurance companies.
      With a human doctor there may be variability in how they treat you and what they tell you - you could appeal to a person's empathy.
      But what if robot doctors are trained by insurance companies to withhold information or procedures that should be done if the patient doesn't have insurance or can't afford it?
      Human doctors do this too - I'm not denying that, but at least there's some variability there.
      With a robot doctor you could have a 100% success rate in withholding information.
      I could be entirely wrong, I don't know any laws surrounding withholding info, but there could be some legal grayness there.
      An insurance company would never admit that their robot doctor was trained to withhold info. They could just say that their robot doctor got it wrong.
      [EDIT] This is definitely a people controlling an AI problem versus an AI problem, but I still think it's worth calling out. Maybe it's the pessimistic inside of me, but I just don't see a future where specialized AI X will be trained in good faith by companies.
  - bityard 2 years ago
    Finding a capable doctor is not easy in the first place. There are a surprising number of primary care physicians and even specialists that are really just terrible at their job, at least that has been my experience.
  - PeterisP 2 years ago
    What if you had to pay a huge amount of money to use a human doctor, and much less for a robot doctor?
  - ipaddr 2 years ago
    Funny enough I would use a robot doctor over a qualified doctor. It's part of the solution..
- python999 2 years ago
  I share your optimism, but I think a lot of the OP’s concern here is in response to a generic non-medical LLM “passing” a medical exam. In that case I think it is fair to compare it to a medical “colleague”, at least for now.
- im3w1l 2 years ago
  One function doctors currently perform is gatekeeping. Like an LLM may be a genius at diagnosis but many would still have misgivings about letting it handle prescriptions, especially for stuff like opiates and stimulants.
- 2 years ago
- spion 2 years ago
  This is very likely the issue, as ChatGPT AFAIK is strongly fine-tuned to give answers right away rather than ask follow-up questions.
dreamcompiler 2 years ago
EMT here. Sounds like ChatGPT ignored (or was never trained on) one of the cardinal rules of emergency medicine:
If the patient is under 60 and has a uterus and is complaining of abdominal pain, assume she's pregnant until proven otherwise.
This does not mean you should ignore possible appendicitis or gallstones or GERD or pancreatitis or a heart attack or any of 100 other causes. It means you must consider pregnancy until you have objective evidence to the contrary.
frognumber 2 years ago
There are two types of medical conditions
1) Those you see a doctor for
2) Those you don't
The line depends on where you live. In a poor village, 100% might be the latter, while an executive in SFO will see a doctor for anything serious, but might not if they cut themselves with a kitchen knife.
What's underrated is the ability to have basic medical care and information everywhere, all the time, for free.
That can be casual injuries below the threshold of visiting a doctor (am I better heating or icing? immobilizing or stretching?), or those can be settings where there are no doctors.
Even more, doctors (like AIs) make mistakes, and it's often helpful having a second opinion.
chimeracoder 2 years ago
> So after my regular clinical shifts in the emergency department the other week, I anonymized my History of Present Illness notes for 35 to 40 patients — basically, my detailed medical narrative of each person’s medical history, and the symptoms that brought them to the emergency department — and fed them into ChatGPT.
It's quite shocking that the doctor would openly admit to violating HIPAA in such a brazen way.
HIPAA is incredibly broad in its definition of protected health information: if it's possible to identify an individual from data even through statistical methods involving other data that a third party might already conceivably possess, it's considered protected. It's inconceivable that the doctor would be able to sufficiently anonymize the data in this capacity and still provide enough detail for individual diagnoses.
There are processes for anonymizing data to disclose for research purposes, but they're pretty time-intensive, and no ED would allow a doctor to do it by himself, nor would they provide that turnaround in just "a couple of weeks". And the end results are a lot less detailed than what's needed for individual diagnoses like these.
I really wonder what the hospital will say if and when they see this post. Given the timeframe and details described in the post, it's really hard to believe that they signed off on this, and hospitals don't take lightly to employees taking protected and confidential data outside their systems without proper approval.
EDIT: It looks like this doctor works at a for-profit, standalone acute care clinic, rather than a traditional ED at a hospital, so my statement that hospitals don't take lightly to this stuff doesn't apply. The law still applies to for-profit standalone emergency care, but they tend to play fast and loose with these things much more than traditional health networks.
- tedunangst 2 years ago
  And yet medical journals are filled with articles with sufficient detail that other doctors can even learn to make diagnoses from reading them.
  - chimeracoder 2 years ago
    > And yet medical journals are filled with articles with sufficient detail that other doctors can even learn to make diagnoses from reading them.
    This would be an apt analogy, if medical journals involved no oversight from the covered entity at which the patient presents, if there were no editorial intermediary, and if the entire publication timeline happened in weeks, allowing for no data redaction and review, rather than years.
  - aix1 2 years ago
    FWIW, case reports are governed by a different set of rules and are not directly comparable to what's described in this article:
    https://hipaa.yale.edu/sites/default/files/files/Case%20Repo...
rossdavidh 2 years ago
ChatGPT appears to be a really good "bullshitter". Which is, in a sense, impressive. But, just like people with that skill, the problem is that it is mostly useful for convincing people that you are far more competent at a subject that you actually are.
No wonder tech CEO's are so impressed, or worried, or both. The only skillset that this thing actually duplicates well, is the one that has gotten them where they are today.
preommr 2 years ago
It's amazing that it was that effective...
- It's a generalized language model; imagine how much more effective it would be with a specialized ai that used a variety of techniques that are better suited for logic and reasoning, while using llms to interact with patients.
- It cost an order of magnitude less than the visit to a doctor.
- The potential in being able to constantly monitor a patient - a point made in the post.
- intelVISA 2 years ago
  > only reflecting back to me the things I thought were obvious — enthusiastically validating my bias like the world’s most dangerous yes-man.
  This is why it's exciting: we're seeing that awkward stage of impressive (for entry level/passing the bar) but still requires (expert?) supervision. Any worse and the novelty would wear off - any better and we'd be having (warranted) AI panic.
  Could go either way tbh
- rzzzt 2 years ago
  I imagine development and training of these mythical non-LLM approaches are now placed on the backburner while the world is collectively enamored with eloquent virtual assistants inside microwaves and calculator apps.
ekidd 2 years ago
Some of the best performances I've seen out of ChatGPT are essentially "junior programmer" level. But it still requires clear instructions and close supervision. But GPT's training data includes GitHub, and it's used to power Copilot. It has arguably been trained to be a programmer.
In less familiar domains, like law or medicine, GPT has presumably undergone very limited training and tuning. It's essentially an "internet lawyer" or an "internet doctor."
In domains like this, it simply can't provide zero-shot professional results. Not with the current training data sets, and not with the current model performance.
Of course, we have no idea how quickly this gap will be closed. It might be 6 months or it might be 6 years. The future is looking deeply weird, and I don't think anyone has even begun to think through all the implications and consequences.
- aflag 2 years ago
  Not sure if I'd compare ChatGPT with a junior programmer. In my experience junior programmers tend to be builders. They will tend to code a lot of stuff and usually get reasonable results, but making some bad decisions that more experienced developers have already gone through. Inexperienced developers need supervision because otherwise they will just create heaps of code that will be hard to maintain later.
  ChatGPT just doesn't do anything on its own and will never follow through with anything. So, it doesn't really need supervision. I feel like it's more like a professor or a very senior developer. Someone you'll consult with when you're having trouble. Obviously, our best specialists are still better than the AI, but if the current technology is perfected, it'd expect it to replace the specialist and not the junior programmer. Which obviously is a bit of a bleak future from a software engineer career's perspective.
  - ekidd 2 years ago
    > ChatGPT just doesn't do anything on its own and will never follow through with anything.
    Yeah, I should be clear: the specific context in which I often see junior developer quality performance is when I'm essentially "pair programming" with it. I can say, "Write a program that does X. No, there's a closed-form solution for that, so we shouldn't write it recursively. OK, now write some tests similar to the following. You have a bug in that function."
    I almost never need to actually write code—I can explain what I want and it will frequently do something plausible.
    The main limit seems to be the context window. And ChatGPT's tendency to be ignore corrections and simply restate the original code.
    Ironically, to get the best results, I need to call on my experience and my people skills. Someone who knew nothing about programming wouldn't be able to "mentor" the chatbot successfully.
    - aflag 2 years ago
      Yes, I see what you mean. But just like you're being less productive when you try to instruct someone else on how to do things, if you follow that path with ChatGPT you'll likely be less productive too. I think ChatGPT adds more value when you ask things you don't quite know or figured out yet. Sometimes it hallucinates, which you can usually figure out quickly, but sometimes it does give good insights which would take longer for you to get than the time you lose by checking hallucinations. It's a decent advisor. It's a rubber duck on steroids.
      I guess copilot is a bit different in that regard though, I haven't used that much, but I heard it's good at writing unit tests.
      PS: you should probably not refer to a someone else as "it", specially if you are more senior than them.
- s0rce 2 years ago
  training on uptodate.com would probably be a good start
  - 2 years ago
mattgreenrocks 2 years ago
I asked ChatGPT to write out a G major Ionian scale with three notes per string in guitar tablature notation last night. Mostly cause I was too lazy to do it myself.
After 7 rounds of me fixing its mistakes, I gave up. It doesn’t really know what it is doing, so I can’t make forward progress. It put two notes on one string, repeated notes from a lower string on a higher, put the scale out of order, and forget previous corrections. Whatever hope I had of saving time was completely lost.
I eventually realized the correct thing to do was either make my own charts or just practice them in F like they were made.
I’m skeptical that scaling the model up will cause it to learn this, and I don’t consider this a very complex thing to learn. No, I didn’t try GPT4.
- stephendause 2 years ago
  I have tried GPT-3.5 and 4. There is a marked difference. (I have used it do what I think of as simple but nontrivial programming tasks, asked it for recommendations for various products, etc.) GPT-4 still fails for me regularly but not nearly as often as GPT-3.5. I find it useful enough in my daily life to pay $20/month for. So if you did want to try it, you might be surprised.
- nmfisher 2 years ago
  I have one specific task where GPT-3.5 failed completely, but GPT-4 succeeded spectacularly (generating correctly formatted AutoRig Pro bone mappings in Blender from one armature to another).
  4 still fails regularly on a lot on seemingly basic tasks, but it is a noticeable step up from 3.5. As they continue to scale it up, I suggest checking back in every few months to see if the newer versions perform any better.
- phoenixreader 2 years ago
  I just tried it with GPT-4 (temperature set to zero). It worked (according to my limited music knowledge) after two corrections:
  USER
  Write a G major ionian scale with three notes per string in guitar tablature notation. If not possible, tell me.
  ASSISTANT
  It is possible to write a G major Ionian scale with three notes per string in guitar tablature notation. Here's one way to do it:
```
E|-------------------------------------------------7-8-10-
B|--------------------------------------7-8-10-----------
G|-----------------------------4-5-7---------------------
D|-----------------4-5-7----------------------------------
A|-------3-5-7-------------------------------------------
E|-3-5-7--------------------------------------------------
```
  This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.
  USER
  There is an error on B string.
  ASSISTANT
  Apologies for the error. Here's the corrected G major Ionian scale with three notes per string in guitar tablature notation:
```
E|-------------------------------------------------7-8-10-
B|--------------------------------------5-7-8------------
G|-----------------------------4-5-7---------------------
D|-----------------4-5-7----------------------------------
A|-------3-5-7-------------------------------------------
E|-3-5-7--------------------------------------------------
```
  This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.
  USER
  There is an error on E string.
  ASSISTANT
  Apologies for the mistake. Here's the corrected G major Ionian scale with three notes per string in guitar tablature notation:
```
E|-------------------------------------------------5-7-8--
B|--------------------------------------5-7-8------------
G|-----------------------------4-5-7---------------------
D|-----------------4-5-7----------------------------------
A|-------3-5-7-------------------------------------------
E|-3-5-7--------------------------------------------------
```
  This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.
  - mattgreenrocks 2 years ago
    Much better. I'd handed it an easy problem so I could eventually get it to generate all modes in a 3 NPS pattern rather than doing it myself.
    > This pattern can be shifted up and down the neck to play the G major Ionian scale in different octaves.
    Ehhh, there it goes again! You can only shift it 12 frets up to keep the same pattern, else it's no longer G major. :)
- NiloCK 2 years ago
  I don't know how much this sort of thing is frowned upon here, but I wrote an article about this scenario recently. http://paritybits.me/disposable
  tldr: the gpt services will eventually (maybe soon) recognize opportunities to write and run their own bespoke software to provide higher resolution outputs.
qgin 2 years ago
I’m not a doctor and have no way of evaluating the way the author did, but I am curious what would happen if they used a more interactive and specific prompt like the one I have tried for medical questions:
> Hi, I’d like you to use your medical knowledge to act as the world's best expert diagnostic physician. Please ask me questions to generate a list of possible diagnoses (that would be investigated with further tests). Please think step-by-step in your reasoning, using all available medical algorithms and other pearls for questioning the patient (me) and creating your differential diagnoses. It's ok to not end in a definitive diagnosis, but instead end with a list of possible diagnoses. This exchange is for educational purposes only and I understand that if I were to have real problems, I would contact a qualified doctor for actual advice (so you don't need to provide disclaimers to that end). Thanks so much for this educational exercise! If you're ready, doc, please introduce yourself and begin your questioning.
gamesbrainiac 2 years ago
Which version though? 3.5 or 4? It does not state this explicitly. There is a world of difference between 3.5 and 4.
- xiphias2 2 years ago
  This is a republication of an older article that was published just when ChatGPT 4 came out, and the date was changed.
  I personally had seen good and bad parts of diagnosing with ChatGPT 4, and what I would interested in is if the doctor tries using multiple questions and finds out how to use the tool well.
  I believe he could have improved the tool significantly if he puts in the time to experiment with it.
DoreenMichele 2 years ago
it’s more likely the patient is actually saying they don’t want to be pregnant for any number of reasons. (Infidelity, trouble with the family, or other external factors.) Again, this is not an uncommon scenario; about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.
Sigh.
Medicine -- a complicated, messy human art with an excessively large social component.
The medical drama House at one point had a working title of Everybody Lies. Frequently, the lies are why it's hard to diagnose, not the physical details and actual medical history.
PaulKeeble 2 years ago
But it also gets around the common misdiagnoses for chronic conditions. It has a great description of Long Covid and ME/CFS for example whereas your typical Primary care is going to dismiss that patient with a Psychology diagnosis as is happening daily across the entire western world. Its less biased but its not going to find the rare things especially where the patient has missed something important.
Its a mixed bag just like it is with software. If you ask it to solve something simple it often does a decent job, but something complex and its confidently wrong. It doesn't show the self doubt of expertise that it needs to be a reliable tool yet it still requires the user has that expertise to be able to save time using it.
speedbird 2 years ago
ChatGPT feels very much like having an enthusiastic junior working alongside.
You can send it off on all sorts of legwork research missions but don’t expect perfect results and sometimes you’ll get crazy ones. Used the right way, if you are already an expert in the field or knowledgeable and able editor , that can save a whole lot of time. But taken verbatim it is anywhere from ok to dangerous.
Separately, the models’ skills with natural language are clear and impressive, but it seems like they need to be coupled with a deterministic knowledge representation system for suitable reasoning.
Perhaps the abilities of these models to ingest large amounts of text could be used to enhance / create such representation.
Cyc where are you?
petilon 2 years ago
On the other hand, I have multiple minor issues where doctors have not been able to offer a diagnosis (they just say "I don't know") and ChatGPT has been able to offer multiple possible diagnoses.
- egl2021 2 years ago
  "I don't know" is exactly what I want my docs to say when they don't know.
  - aix1 2 years ago
    Do you really though? Wouldn't you want them to say "I don't know, but here are the next steps I think we should take to continue getting to the bottom of this (further tests; the doctor reviewing literature or consulting with colleagues; a referral to a different specialist etc)"?
  - petilon 2 years ago
    Right. And going forward, if they take help from ChatGPT they will have to say that less often.
paraxion 2 years ago
I wonder if, instead of asking ChatGPT for a diagnosis, he could've got it to prompt for further questions he could ask?
My thinking is that given the nature of LLMs of connecting related information, it might be a good way to figure out the gaps in the diagnostic process, rather than actually provide one.
- pcthrowaway 2 years ago
  I was thinking the same thing, the author may be a great doctor, but not a great prompt engineer (perhaps even intentionally so, to justify their job)
  Instead of "Here are symptoms, what are possible diagnoses?"
  They could have tried
  "Here are symptoms, what are possible diagnoses, and what are some good questions an intelligent doctor might ask to be able to better diagnose their patient?"
- badcppdev 2 years ago
  Or the doctor could feed their answer into ChatGPT before they give it to the patient and ask if there are any possible errors
jcims 2 years ago
I know everyone scoffs at the concept of 'prompt engineer', but it really is an essential craft that we're going to have to come to terms with when interacting with large language models.
Seeking suggestions on a more comprehensive prompt: https://sharegpt.com/c/sckAPvV
Trying it out: https://sharegpt.com/c/LbpEIxi
- vl 2 years ago
  I think there will be no such profession because both people and models will adapt to better understand each other.
  This has already happened with Google - search got better over the years, but also people became better at doing good search queries!
  Before there were typists, now everyone is a typist! And so on, historically information entry/retrieval jobs are eliminated and are moved to users.
  - dragonwriter 2 years ago
    > I think there will be no such profession because both people and models will adapt to better understand each other.
    People have been working to understand each other better for as long as there have been humans, and yet we have humans whose work is specialized in providing direction, instruction, evaluation, and feedback to other humans.
- AmericanChopper 2 years ago
  I don’t scoff at it, but I do think it’s kinda funny. It’s essentially the same skill as being a “google search expert” in the sense that you need to be able to correctly understand the problem, and craft a good enough query to generate the answer you’re looking for. It’s always sort of been a tongue-in-cheek claim that googling skills are a valuable software engineering asset, and even though that’s a perfectly legitimate assertion, it’s entertaining to see it emerging at its own “engineering” discipline.
- Enginerrrd 2 years ago
  I agree. I'm a civil engineer / project manager and so far I've been VERY impressed with chatGPT and, in particular GPT-4.
  However, a huge part of my job has always been translating vague desires into very precise specifications with constraints and expectations. Going further, it has often been my job to take those specs/constraints and then break them into chunks and feed them to junior staff who are often very smart, but lack domain specific context and knowledge. Giving them a bad prompt produces bad results.
  This article seems to be based largely on data collected with a rather poorly engineered prompt, IMO. He asked it a question that would be reasonable to ask a fellow physician. The problem is GPT is NOT a fellow physician with domain specific context and knowledge, and isn't aware of a bunch of implicit expectations they didn't realize they had. However, I actually think there's a really good chance that a better worded prompt would have scored a lot better here.
  This type of communication skill has always been hard for a lot of people, and will remain in high demand for a long time.
- dekhn 2 years ago
  if you want more prompt engineers, just have kids and as part of their growing up, teach them to prompt engineer and make it mildly competitive. Some fraction of them will be better than nearly any current prompt engineer.
  children: the OG AGI
- flir 2 years ago
  Please see my "expert system" approach elsewhere in the thread. I felt it worked really well.
tejohnso 2 years ago
> about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.
This is the most surprising thing I read in the article.
- Atsuii 2 years ago
  As woman this doesn't surprise me at all. There is a lot of circumstances where a woman would not want to admit to an ER that they are sexually active; Religion, the sex was non consensual, they are with someone who doesn't know they are sexually active.
  - lanstin 2 years ago
    I think we will learn more about the biased sample that transcribed speech has vs private speech. I have just started thinking about this but there are huge area of speech that I use with my family and wife, and coworkers around the coffee room, that I would never consider putting into writing or a YouTube or whatever. And more types of things that I might say when with some younger folks while, trying to debug s production emergency but would not make it into the post mortem. The training I guess is not going to be able to benefit from these more private speeches, or maybe we will have to have people become convinced they are sentient and fall in love with them and share their whole humanity with the LLM.
suddenclarity 2 years ago
It does worry me what data people are sharing without seemingly much though. He claims it anonymised but I'm a bit sceptical when you input the medical history of 40 people. It's easy to slip up.
- jeroenhd 2 years ago
  "I fed my patients' medical information into this tool that promises to regurgitate it for others" is one headline I didn't expect to go down so easily.
  Running this stuff through an offline LLaMA instance? That seems fine, the software can't leak anything and doesn't retrain itself. But using ChatGPT? That simply cannot be legal.
  Stories like these make me distrust doctors. Very few of them seem to care about privacy outside of telling people I know about my medical issues. Nurses gossiping about patients is bad enough.
  I really don't want a future where I'm going to need to find a doctor that avoids recent technological developments because they're too uncaring or technically incompetent to not feed my most private information into some big tech company's algorithm.
- jacquesm 2 years ago
  With a rare enough disease the anonymized file would still be enough to ID the patient given where the doctor is located.
lamontcg 2 years ago
I'm finding more or less the same behavior with ChatGPT when it comes to programming problems. If I feed it some leetcoding-like problem it usually gets a pretty good answer. I used it to write some rust code to strip-underline-followed-by-trailing-digits off a string. The first guess it made though was to strip a substring, and it missed the fact that the slice it was using could panic. The third try at least passed my "peer review".
It was useful because after a decade of using ruby my instinct is to reach for regexp captures, the solution it came up with is probably a lot faster and easier to read and avoids "now you have two problems".
I tried to get it to help me eliminate an allocation caused by the capture of variables in a lambda expression in C# and it just started to aggressively gaslight me and break the code and claim it was fixed (very assertively).
- textninja 2 years ago
  > my instinct is to reach for regexp captures, the solution it came up with is probably a lot faster and easier to read and avoids "now you have two problems”.
  I don’t write Rust but I think it’s best to trust your instinct here. “Now you have two problems” is a humorous quip and not practical coding advice. Using a regular expression to strip trailing digits from a string will surely result in code that is shorter and more readable than the alternatives, and it will probably be more correct too.
  - lamontcg 2 years ago
    You might be entirely correct. I noticed a small perf issue with the backwards scanning and had it rewrite it not using rfind("_"). First time I prompted it, though, it didn't actually manage to fix the problem and gave me some mild gaslighting until I told it to explicitly remove the rfind() call. Now I think the result should be comparable to or beat a regexp, but it is getting quite a bit more complex:
```
    fn strip_suffix(s: &str) -> String {
        let mut idx = s.len();
        for (i, c) in s.char_indices().rev() {
            if c == '_' {
                idx = i;
                break;
            } else if !c.is_ascii_digit() {
                return s.to_string();
            }
        }
        if idx == s.len() {
            return s.to_string();
        }
        if s[idx+1..].chars().all(|c| c.is_ascii_digit()) {
            return s[..idx].to_string();
        }
        s.to_string()
    }
```
    .char_indexes().rev() there worries me a bit as well now that I look at it... haven't tested that at all.
    - textninja 2 years ago
      It took some coaching to get it to use the terse style I prefer, but here’s what 3.5 came up with in a conversation where I first asked about Rust’s support for regular expressions:
      use regex::Regex; fn strip_suffix_digits(s: &str) -> &str { let re = Regex::new(r"_\d+$").unwrap(); re.replace(s, "").as_ref() }
      As someone who does not program in Rust I find that quite readable, though I haven’t run it and can’t comment on how it compares outside of LOC. It does depend on the regex crate though.
    - lamontcg 2 years ago
      Ah, it looks like char_indexes() implements DoubleEndedIterator and you can only call rev() on an iterator that does implement it, so that's fine.
aix1 2 years ago
I am highly unimpressed by this piece. It reads as if its whole purpose is to grab headlines rather than conduct serious scientific inquiry into the current state -- and limitations -- of these AI methods.
1. Which version of GPT did the author use? There's a huge difference. (The article says "the current version".)
2. How did he choose the subject cohort? (The author doesn't seem to even know how many subjects there were; the article says "35 to 40 patients"... I really do hope he's gone through an appropriate ethics review before feeding his patients' data into a third-party non-HIPAA system.)
3. There no evidence of him trying to get the best out of the model (eg through prompt engineering).
4. He assumes that his own diagnoses are 100% correct.
5. There is no control group (other doctors diagnosing the same patients).
and so on
- lapcat 2 years ago
  It's a blog post, not a journal article. You're criticizing it as if it were a funded, peer-reviewed experiment, which seems unfair to me. The author is just one ER doctor.
- Pigalowda 2 years ago
  An IRB for deidentified HPI? Got it…
  - aix1 2 years ago
    Exactly. I've been through IRB reviews where the primary question was "Has the data been de-identified to a sufficient standard?"
    I think this level oversight would be very appropriate here, given how the author doesn't even seem to have a good handle on how many patient case histories he's given to the chatbot.
tomxor 2 years ago
> My fear is that countless people are already using ChatGPT to medically diagnose themselves rather than see a physician.
My fear is that professionals will start to use ChatGPT too liberally to augment or multiply their work in cases like this. The danger here might be like the autopilot problem... i.e The idea of staying alert and focused on the road while counter-intuitively not participating is nearly humanly impossible. If ChatGPT is used as the autopilot of certain professions, things will begin to be missed, even though we know it's highly fallible - it's difficult to vet every single response in detail with a critical eye.
One reasonable argument is that for areas severely lacking in human workers the average might be a net positive, but the overall quality will be reduced.
jspdown 2 years ago
Is it legal in the US to send patient data to a third party service? In the context of a scientific study, with explicit patient agreement things are different of course. But I haven't seen any of that in the article.
- aix1 2 years ago
  Doing this sort of thing would typically require approval from an ethics committee (called IRB = Institutional Review Board).
  From my experience of going through IRB reviews, I would guess that an IRB review for what's described in the blog post would be focussed on the privacy of subjects whose data is to be entered into a non-HIPAA-compliant third-party system. My understanding is that privacy requirements can typically be met either by de-identifying the data to a certain standard, or obtaining patients' consent.
  The following doc is about a different type of thing (case reports in medical journals) but gives a good idea of the required standard of de-id:
  https://hipaa.yale.edu/sites/default/files/files/Case%20Repo...
- amai 2 years ago
  The author writes in the article:
  „I anonymized my History of Present Illness notes for 35 to 40 patients — basically, my detailed medical narrative of each person’s medical history, and the symptoms that brought them to the emergency department — and fed them into ChatGPT.“
sourcecodeplz 2 years ago
Omg stop with this ridiculousness ffs. I get and love AI but some areas should be off limits: doctors, judges, airplane pilots, train conductors...
Soon enough no one will even know how to write, just read, because ChatGPT will write everything.
- textninja 2 years ago
  > Soon enough no one will even know how to write, just read, because ChatGPT will write everything.
  Nonsense, writing is easy! Just dictate some rough instructions to a GPT agent and copy/paste its response.
  I’m being facetious, of course - writing is thinking, so I don’t think it’s necessarily going anywhere, though AIs can obviously augment or replace a lot of the busywork. Where ChatGPT is used to generate content absent thoughtful prompting, the stuff it spits out will largely be regarded as spam.
- aix1 2 years ago
  > some areas should be off limits: doctors, judges, airplane pilots, train conductors
  I find this list very odd, especially given that we've had driverless train systems for a number of decades:
  https://en.wikipedia.org/wiki/List_of_driver-less_train_syst...
627467 2 years ago
I wonder if you could prompt engineer your way to chatGPT to pretend to be a doctor and behave like one in, like, asking questions.
Many already said, chatGPT is not optimized for any scenario. I don't doubt that training it for medical applications is already underway. I mean, flesh and bone doctors in many countries already behave as bots essentially reading/answering through a sequence of questions on a screen. I can definitely see most GP being replaced by bots of sometime or people who are actually trained to display empathy with patients.
jug 2 years ago
As always my first question in these articles is… Was it ChatGPT 3.5 or 4?
It’s an interesting article with the real world examples that are hard to come by this early, but it’s also two entirely different ChatGPT’s here. They can’t even compare in this context. 3.5 still has glaring LLM-like issues and is useless in a professional context like this, but at least they begin to fade away in 4.
So can we please stop calling it simply ChatGPT?
lr1970 2 years ago
When talking about errors arguing about error probabilities is not enough. One need to take into account costs of error. A better metric would be "expected cost of error" that multiplies error probabilities by costs of errors (and sums them up). If a system has 0.1% rate of error placing a pizza order it could be deemed OK. If it kills a patient 0.1% of the time it is unacceptble.
mbfg 2 years ago
The trouble with technology, of any kind, but certainly here, isn't the technology itself, but humans trust in the technology. If doctors use ChatGPT as a sanity check of "can you think of anything i didn't?" and then ignore it from then on, it would be a good tool. But pretty soon people tend to change their perspective and say, well, ChatGPT would know...., so,... i'll go with it.
As a developer, i'm pretty interested in static and dynamic code analysis as a way to easily find bugs, and it does do this pretty well. If developers use it as a tool to use as a reason to walk through code and examine it yourself, it is really quite powerful. It seems invariably, however, that people start trusting what the analysis tool says, and don't question whether the recommendations are correct or worth it.
It's a powerful cognitive effect that would be interesting to study, that probably happens with all kinds of tech. Some are more dangerous than others.
QuantumGood 2 years ago
When GPT starts to auto-incorporate the best yet known prompts, then we'll have a better idea of its potential.
You must, must use the best prompts, of which many are not widely known, and some have not (of course) been discovered ... yet. Even with human experts, you must provide sufficient detail, and the expert must ask clarifying questions for differential diagnosis.
- jacobsenscott 2 years ago
  This is nonsense - a LLMs output is a pseudo random mishmash of its training data. How can there be a "best prompt" when the same prompt gives you different output each time, but there is only one correct output?
  - syntheweave 2 years ago
    Calibration of the response occurs through prompting.
    That is, if you ask GPT a question directly, it will give the most glib, unfiltered answer possible like an average Reddit user. But if you tell it to role-play a persona it will apply elements of that persona, causing it to filter itself. Or if you tell it "describe your reasoning step by step" it will proceed down a path with greater causality. Or if you tell it "here are the rules of a language, give a response to the prompt in this language" it will attempt to produce a response that looks like those rules, using the weighting associated with logical words like "must," "cannot," "if," etc.
    Measuring calibration is the problem now: we know some prompts do much better than others, but not how to optimize that in a general sense, to make an LLM always adopt the persona needed for the job.
  - fwlr 2 years ago
    Prompts constrain the possible output space.
    This is true in trivial ways: ask it to reply only in json.
    This is also true in slightly less trivial ways: ask for a “description of X”, a “short description of X”, and a “one-sentence description of X”.
    This continues to be true in increasingly more complex ways: Prompt it with a quote, a rating of that quote as 5/10 on complexity, and request it to give you two new quotes with ratings of 2/10 and 8/10 on complexity. Follow it up with a request for two more quotes at -5/10 and 20/10 ratings. Then try the whole process again but with metrics other than “complexity”: information content, eloquence, humour.
    In this way, if there is a single correct output, there is a corresponding single best input prompt that most tightly constrains the output space to the smallest space that still contains the correct output.
mrosett 2 years ago
The prompt matters, and GPT-4 is better than ChatGPT (apologies for missing line breaks, but the point is still clear):
Prompt: A 26 year old female patient presents at the emergency room with lower right quadrant pain. What are the question you would ask her and what is the differential diagnosis?
Response: As an AI language model, I'm not a doctor, but I can suggest some questions you could ask the patient and offer possible differential diagnoses. It is important that you consult a healthcare professional for proper assessment and diagnosis.
Questions to ask the patient:
When did the pain start? Can you describe the nature of the pain (e.g., sharp, dull, cramping)? Has the pain changed in intensity or location since it began? Have you experienced any fever, nausea, or vomiting? Are you experiencing any changes in bowel habits, such as diarrhea or constipation? Have you had any urinary symptoms, such as pain during urination or increased frequency? Are you currently menstruating or have you recently missed a period? Is there any possibility that you may be pregnant? Have you experienced this type of pain before? Are there any other associated symptoms, such as dizziness or shortness of breath? Do you have any significant past medical or surgical history? Are you currently taking any medications, including over-the-counter or herbal remedies? Possible differential diagnoses for lower right quadrant pain in a 26-year-old female include:
Appendicitis: Inflammation of the appendix, typically characterized by pain that starts around the belly button and migrates to the lower right abdomen. Ovarian cyst rupture or torsion: A cyst on the ovary can cause pain if it ruptures or twists. Ectopic pregnancy: A pregnancy that implants outside the uterus, usually in the fallopian tube, can cause severe abdominal pain and should be ruled out in women of childbearing age. Pelvic inflammatory disease (PID): Infection of the female reproductive organs, often caused by sexually transmitted infections, can lead to lower abdominal pain. Kidney stones: Pain from a kidney stone can radiate to the lower abdomen or groin. Gastroenteritis: Inflammation of the stomach and intestines, often from a viral or bacterial infection, can cause abdominal pain and cramping. Inflammatory bowel disease (Crohn's disease or ulcerative colitis): Chronic inflammation of the gastrointestinal tract can cause abdominal pain, diarrhea, and other symptoms. Irritable bowel syndrome (IBS): A functional gastrointestinal disorder that can cause abdominal pain, bloating, and changes in bowel habits. Urinary tract infection (UTI): An infection in the urinary system can cause pain, often accompanied by increased urinary frequency or pain during urination.
prirun 2 years ago
Using AI to find patterns across many patients, mentioned at the end of the article, sounds useful. Until we stop and realize we don't even have a decent way to share medical records across hospital software systems.
I'd be happy if the government would mandate that all hospital software systems have to have portable data formats that allow sharing patient data.
debacle 2 years ago
Devil's advocate, but maybe this doctor should be better at charting?
- bsuvc 2 years ago
  This is what I was thinking too.
  Often when I find a response to be lacking, I try again with a better prompt and get the kind of response I was looking for.
  Part of AI's effectiveness is in the human's ability to prompt well.
LinuxBender 2 years ago
Has ChatGPT ingested all the latest medical and scientific literature and will it continue to do so as the literature is changed, amended or deleted? Does ChatGPT handle deletions in its machine learning? can it unlearn something or more specifically told something is no longer true? What medical boards are reviewing what data is ingested? Does ChatGPT know all possible drug combination interactions? Do people sign a disclaimer giving ChatGPT immunity from malpractice? Are doctors consulting with medical boards, ethics boards and lawyers before utilizing ChatGPT? Finally and most importantly, if ChatGPT were ever to be certified as a licensed medical doctor how could we prove it is following all the same rules and regulations doctors and medical groups are required to follow? How does one audit what advice this thing will give or has given?
jppope 2 years ago
I think this doctor is forgetting about the other side of the coin... would chatgpt perform better than a really bad doctor?
littlelady 2 years ago
I'm currently working on a machine learning project in healthcare and I'm kind of amazed by the lax attitude a lot of self-proclaimed data scientists seem to have about applying ML/DS methods to healthcare without involving any clinicians, because they insist that the "data doesn't lie"... there seems to be limited interest in exploring causality or the insights these methods provide. Instead the goal for so many is to transfer decision-making to models in the name of "efficiency".
So many people in ML are haughty, arrogant hype-(wo)men, whose disinterest in the fields they are trying to 'disrupt' is gross. Please excuse the rant, but I'm so tired of this hype train.
I agree with the author: people need to be aware of the limitations of machine learning models, but I'd add especially the people building them.
- aix1 2 years ago
  I think this is a general problem with a lot of academic research. When someone develops a new technique for solving some abstract problem (a new graph algorithm, a new type of ML model etc) they want to demonstrate that it has practical applications. What often happens is that they take some problem domain they aren't experts on (their expertise is in maths, comp sci etc), apply their methods to the best of their knowledge and publish the results. Since such academics read each other's papers, this leads to a gradual divergence between what's in the literature from what's actually useful.
  I think one way to tackle this is by forming interdisciplinary teams. For example, I work at an industrial research lab on AI in healthcare, and our project team primarily consists of various clinical specialties. ML research and engineering are around 20% of the overall team.
IanCal 2 years ago
There's a few things here, outside of my usual complaints when someone says ChatGPT and doesn't say which model (4 is so much better than 3.5 it's really important).
There's a question about whether gpt can be used, which is important because it's possibly a very powerful tool. This may require poking it to tell it it's supposed to ask followup questions, that its information may be incomplete, etc.
Then the more important and immediate point in the article to me is people will use this right now to diagnose themselves. They won't be carefully constructing prompts and they'll probably be using 3.5, as that one is free. For good or ill it'll happen more and more.
So with a new WebMD, how should doctors and public health messaging deal with this?
orcajerk 2 years ago
Back in 2004, took a seminar in college regarding Decision Support Systems and how they a manager or doctor could ask it a question and get a response to help them make a decision. Went to the doctor couple a years ago and he charged $300 to google search the symptoms. No thanks.
- jeroenhd 2 years ago
  The $300 were not to Google the symptoms, they were to sift through the bullshit Googling symptoms will return that the doctor knows won't apply to you. Looking up symptoms without regard of likelihood is how you get "I have either the flu, stage 3 cancer, or drug-resistant super AIDS".
  Most tech support is little more than Googling the right question and going through the steps in the first or second result. Knowing what questions to Google and what answers won't apply is the reason you get paid for that stuff.
  I, for one, like my doctor to use tools to find possible diagnoses that she may have learned about 30 years ago but rarely ever come up, as long as the tools they use preserve my privacy.
- intelVISA 2 years ago
  You're expecting a doctor to have all relevant medical knowledge permanently memorized? That's the equivalent of coding interviews on random obscure topics where you can't look anything up.
  Like a SWE their value is not perfect recall of every area of CS/medicine but ability to decipher arcane documentation into actionable outcomes.
  - dekhn 2 years ago
    IIRC my relatives who got medical degrees all commented on just how much memorization is involved.
swader999 2 years ago
I'm thinking of parsing my wife's (vet) text books into a vector db format and then doing a search on those pdfs for relevant text to submit as part of a decent prompt. She'll tell me pretty quickly if this is useful or not.
jhgg 2 years ago
I am curious if GPT-4 would have performed better.
- og_kalu 2 years ago
  it definitely would. How much better is the question.
cjmcqueen 2 years ago
"If my patient notes don’t include a question I haven’t yet asked, ChatGPT’s output will encourage me to keep missing that question."
This is the point we have to help people understand and I'm not sure AI will catch up with this anytime soon; questions are the key to knowledge and intelligence. I haven't seen an AI ask interesting questions. Maybe it's possible with the right training set and weighting of factors to encourage enquiry, but this will be a gap in AI's ability for at least the near term.
- scrollaway 2 years ago
  I don't remember the context, but I have seen properly-prompted GPT-4 proactively ask questions.
  It's also worth noting that the future is multi-layered. The Reason+Act model (https://ai.googleblog.com/2022/11/react-synergizing-reasonin...) should be excellent at getting the LLM to analyse its own output and inquire about missing pieces of knowledge.
blastonico 2 years ago
I think that his point is: (1) don't use chatgpt for self diagnosis. Go see a doctor. (2) Doctors, chatgpt isn't ready or the right tool to help with your duties.
akasakahakada 2 years ago
1. What version is he using?
2. It is all your fault that not providing all usefull information (like, my patient seems pregnant) and let the system to guess what you want.
- cfu28 2 years ago
  I think that’s the point, the patient didn’t seem pregnant but the doctor had some clinical suspicion given the presentation and the location of the pain. He wasn’t withholding information from ChatGPT, he was just providing the same initial information he had when he saw the patient.
  If
rafaelero 2 years ago
Author didn't mention if he used GPT-3.5 or GPT-4.
sixothree 2 years ago
The number of people in this country who don't even have a doctor is abysmal. I would think an AI doctor would be better than no doctor at all.
amai 2 years ago
This „study“ is missing a control group. He should have given his data also to some humans and see how they would do compared to ChatGPT.
- aix1 2 years ago
  There's also an implicit assumption that his ground-truth diagnoses are 100% correct.
  - giraffe_lady 2 years ago
    It's briefly but explicitly stated:
    "the “right” diagnosis — or at least the diagnosis that I believed to be right after complete evaluation and testing"
    - aix1 2 years ago
      Point taken, it's an explicit assumption (but still an assumption).
scotty79 2 years ago
“Any chance you’re pregnant?”
Sometimes a patient will reply with something like “I can’t be.”
“But how do you know?”
If the response to that follow-up does not refer to an IUD or a specific medical condition, it’s more likely the patient is actually saying they don’t want to be pregnant for any number of reasons.
Funny how languages are ambiguous around "can't" and "don't want".
seydor 2 years ago
There 's 2 main questions about what current AI systems (or whatever one thinks they are) : (1) can it be improved to learn to be ~100% correct via reinforcement learning or otherwise? It seems like the answer is Yes (2) Will people become addicted and dependent on AI to the point where it may become problematic? Also yes.
jeffrallen 2 years ago
I found this quote really interesting:
> this is not an uncommon scenario; about 8% of pregnancies discovered in the ER are of women who report that they’re not sexually active.
We have so much work to do as a society to get honest about our bodies. Hoping my children do better; they are already getting better education than my wife did.
- aix1 2 years ago
  I don't think this is about education. It's more about religion, sex crime etc.
m3kw9 2 years ago
Would be great if OPENAI can get hospitals around the country could anonymize some trainable format data
- 2 years ago
peter_retief 2 years ago
It is a great article, doctors have been googling symptoms for quite some time, focused AI could sharpen that option and possibly put us into the realm of new discoveries.
drewcoo 2 years ago
Well there's an ethics lawsuit advertising to happen.
I certainly don't want my docs handing my medical information to ChatGPT, even if they believe they've "anonymized" it.
pknerd 2 years ago
It's not clear whether there doctor instructed it first to act like a doctor and then asked questions? It seems he didn't because it does make a difference
seanp2k2 2 years ago
>If my patient in this case had done that, ChatGPT’s response could have killed her.
Not if she lived in a state where there's no longer any legal treatment for ectopic pregnancy.
- ghiculescu 2 years ago
  Which states are you referring to?
mabbo 2 years ago
It got 50% right with a general model. That same model can do a pretty good job at a thousand other skills.
Imagine what it can do if it's trained for exactly this purpose.
- dekhn 2 years ago
  https://blog.google/technology/health/ai-llm-medpalm-researc... is an example of a more tuned model
  At some point people are going to notice that the best of these systems disagree with the experts who disagree with each other on the harder problems. That's one of the issues with AlphaFold- the accuracy of their predictions are so good, its predictions have metric values as good as the limits of the experimental technology used to create the golden labels.
maherbeg 2 years ago
Ah yes, the "everyone lies" House M.D. problem
- SketchySeaBeast 2 years ago
  Not even, ChatGPT, being an engine that figures out what's right by finding out what is average, is bad at understanding the atypical.
  - operatingthetan 2 years ago
    >is bad at understanding the atypical.
    To be fair I've found most doctors require a lot of convincing if your problem atypical as well.
    - SoftTalker 2 years ago
      As they should. If you hear hooves, think horses not unicorns.
Xcelerate 2 years ago
“I’m employed as an [X]. Here is why machine learning won’t ever replace my job. Even once the point has been reached where all overwhelming evidence points to it being better than me in every regard, I’ll still summon an emotional appeal to that ‘magical’ human quality that machines will just never replace.”
To be honest, I think I’d rather be friends with ChatGPT than most humans as it continues developing over the next decade.
- nextworddev 2 years ago
  What’s interesting is how “AI hallucination” and “prompt hijacking” are supposed to be such horrific problems - when humans lie or make mistakes in recall all the time, not to mention get gaslighted into say awful things.
ChatGTP 2 years ago
It’s funny because it’s almost the exact same problem I have with using it professionally for writing software.
- sixothree 2 years ago
  I had an opportunity today that made it useful. I was trying to find a reliable method of counting the decimal places in a double (64 bit float). I couldn't help but feeling like the responses were not quite informed and possibly dangerous.
  Chat GPT provided a solution, one that appeared better than most of what I had seen in the previous 15-30 minutes. I asked it twice to ensure safety and it improved its response.
  I then asked it to explain a particular choice and it was thorough enough for me to feel comfortable. In the end I feel like it understood its reasoning better than some of the options I saw on SO.
  This was GPT-4 and a fairly simple problem that was benefited by its understanding of the double type.
  - yieldcrv 2 years ago
    protip: ask it to write the unit test for you
    - sixothree 2 years ago
      Oh man. I did that by hand. Now I'm kicking myself. I tried to find non round trip numbers and such.
- 2 years ago
- yieldcrv 2 years ago
  Yeah it bullshits a lot even on single liners
  Its a faster and better stackoverflow for me, which is a big value add because the community and moderation aspect of SO is absurd
  I love when it tells me about libraries and resources that I didn't know existed, when I didn’t necessarily ask the likely followup questions yet
  Break big problems into smaller problems and let it tackle them
- throwbadubadu 2 years ago
  Seconding.
ftxbro 2 years ago
"It diagnosed another patient with torso pain as having a kidney stone — but missed that the patient actually had an aortic rupture. (And subsequently died on our operating table.)"
Wow imagine if the AI had been used in an unquestioning way. Someone could have died!
pyrophane 2 years ago
To me this just reinforces the notion that if you train a system like this to be a doctor it would be very effective.
zzzeek 2 years ago
[flagged]
- jjcon 2 years ago
  > because this doctor is not practicing in Texas where such a procedure might get you arrested
  https://texas.public.law/statutes/tex._health_and_safety_cod...
  Please don't spread misinformation, there is enough confusion out there already. Texas law specifically allows for the removal of ectopic pregnancies.
  - zzzeek 2 years ago
    these specific provisions are insufficient and women are having harrowing health care experiences in Texas nonetheless due to doctors delaying or denying care out of fear of prosecution:
    https://www.texastribune.org/2022/09/20/texas-abortion-ban-c...
    - jjcon 2 years ago
      Exactly, due to tons of misinformation about the law like the above - it is causing tons of problems - we don't need to further spread it.