ChatGPT can pass US medical licence exams, study claims

AI-generated answers showed ‘new, non-obvious and clinically valid’ insights in tests usually taken by students after years of study

二月 9, 2023

Twitter:

Source: iStock

Answers generated by artificial intelligence can pass the examinations needed to be granted a medical licence in the US, a new study has claimed.

Researchers said?OpenAI’s software ChatGPT?scored at or around the 60 per cent threshold in the series of three tests that make up the Medical Licensing Exam (USMLE) with “coherent” responses that “contained frequent insights”.

Achieving a pass in the “notoriously difficult” assessments – usually taken by medical students after at least two years of study – was seen as a “milestone” for the development of AI tools that could have wide-reaching implications for medical education, according to the study’s authors.

But other academics questioned the validity of the findings,?published?in the open access journal , and called the study a publicity stunt for the healthcare company that backed the researchers involved.

Author Tiffany Kung – a clinical fellow in anaesthesia at Massachusetts General Hospital, part of Harvard Medical School – and colleagues used 350 questions from the June 2022 USMLE, incorporating most medical disciplines from biochemistry to diagnostic reasoning.

Their paper found that, after indeterminate responses were removed, ChatGPT scored between 52.4 per cent and 75 per cent across the exams, which usually have a pass threshold of around 60 per cent.

色盒直播 Campus resource: ChatGPT has arrived – and nothing has changed

They add that ChatGPT also demonstrated 94.6 per cent concordance across all its responses and produced at least one significant insight – defined as “something that was new, non-obvious and clinically valid” – for 88.9 per cent of its responses.

These were higher scores than those achieved by another AI chatbot, PubMedGPT,?which had been trained exclusively on biomedical domain literature. It scored 50.8 per cent on an older dataset of USMLE-style questions.

The authors note that the sample size of questions used was relatively small but feel their study provides “a glimpse of ChatGPT’s potential to enhance medical education, and eventually, clinical practice”.

A preprint of the article circulated on social media had listed ChatGPT as an author as the researchers had asked it to “synthesise, simplify and offer counterpoints to drafts in progress”. The chatbot’s citation was removed ahead of final publication, but Dr Tung stressed that it had “contributed substantially to the writing of [our] manuscript”.

Reacting to the study, Peter Bannister, executive chair of the Institution of Engineering and Technology, said ChatGPT “continues to demonstrate an impressive ability to generate logical content in numerous settings” and the results “serve to highlight the limitations of written tests as the only way of assessing performance in complex and multidisciplinary professions such as medicine”.

“While the results may be of great interest, the study has important limitations that call for caution,” warned Lucía Ortiz de Zárate?Alcarazo,?a pre-doctoral researcher in the ethics and governance of artificial intelligence at the Autonomous University of Madrid.

“We will have to wait and see what results are obtained when ChatGPT is applied to a larger number of questions and, in turn, is trained with a larger volume of data and more specialised content,” she said.

Ms Ortiz de Zárate Alcarazo added that the results had only been evaluated by two doctors and further studies would need to employ a larger number of qualified evaluators to be able to endorse the?findings.?

Collin Bjork, senior lecturer in science communication at Massey University, said the claim that ChatGPT could pass the exams was “overblown and should come with a lengthy series of asterisks”.

He noted that all but one of the authors work for Ansible Health, a Silicon Valley-based healthcare start-up that would soon be likely to need more investment capital. “The media splash from this well-timed journal article will certainly help fund their next round of growth,” Dr Bjork said.

He added claims about the insight shown by the chatbot were “misleading” due to the “vague” definition used by researchers for what constituted this. Claims that AI would one day be able to teach medicine were “naive”, Dr Bjork said. “How can an unaware learner distinguish between true and false insights, especially when ChatGPT only offers ‘accurate’ answers on the USMLE a little more than half the time?”

tom.williams@timeshighereducation.com

阅读更多相关文章

阅读更多相关文章:&苍产蝉辫;

Technology and new media

请先注册再继续

为何要注册？

注册是免费的，而且十分便捷
注册成功后，您每月可免费阅读3篇文章
订阅我们的邮件

除非教学重新开始挑战学生，否则人工智能将取代学者

安迪·法内尔（Andy Farnell）表示，为了吸引被动的学生消费者而按最佳等级分类的教育材料的交付已经成熟到可以自动化了

Andy Farnell

1月 19日

Man talking with a robot to illustrate ChatGPT assessed as ‘powerful tool for education if used correctly’

如果使用得当，颁丑补迟骋笔罢“会是一种强大的教育工具”

专家表示，础滨正日趋智能、更易于访问，不应仅被视为一种作弊风险

Tom Williams

12月 19日

Reader's comments (2)

#1 Submitted by Mapcar on 二月 9, 2023 - 9:23pm

Anyone reading the actual paper with a medical background will realise that there is zero visibility on the MCQ sample that ChatGPT is supposed to have successfully answered. Most likely, there is a strong bias towards those not requiring differential diagnosis or pathophysiological reasoning, namely those for whom the answer exists under a near-litteral form in one of the corpora crawled by the LLM.

#2 Submitted by 色盒直播3000 on 二月 10, 2023 - 12:08pm

So back to viva voce exams, in person, with no external links or practical exams in labs?