Study reveals that even the best AI chatbots hallucinate

Researchers at Cornell University, University of Washington, University of Waterloo and the Allen Institute for Artificial Intelligence have developed a benchmark to determine the extent to which AI chatbots give confident responses that are not based on training data. This is also known as hallucinating.

vpngids.com August 21, 2024

News press release

Using the benchmark, the researchers tested 15 difference language models (large language models or LLMs), including ChatGPT, Llama and Command R. Even the best models produced texts without hallucinations only in 35% of cases. This proves that the output of LLMs is not very reliable.

Influence of Wikipedia and topic

For topics about which there was no Wikipedia page, the probability of hallucinating was higher. This is because a lot of models were trained with data from this site. The researchers had deliberately chosen to ensure that 50% of their questions could not be answered with Wikipedia. The results therefore differ from claims made by AI companies.

Subject matter also had an influence: the models gave more correct answers in subjects such as geography and computer science. Questions about celebrities and finance, on the other hand, proved difficult for the LLMs. No model did well on all subjects.

Claude 3 Haiku was found to give the most fact-based answers, but this was mainly because this model indicated not knowing the answer in 28% of cases. If this factor is excluded, it is precisely OpenAI's models that turn out to be the most reliable.

Danger of hallucinations

AI chatbots' answers can sound very convincing, but they are not always accurate. Chatbots can make things up themselves or rely on incorrect information. Especially as AI chatbots take on an increasing role in society, it is important to realize how unreliable the tools' answers can be.

Hallucinations of AI could become dangerous if humans take chatbot answers for true. "Policies and regulations should be developed to ensure that human experts are always involved in the process of verifying and validating the information generated by generative AI models," said lead researcher Wenting Zhao.

Comments

MORE REACTIONS

AI fast becoming biggest energy consumer

News press release

noyb comes up with 9 complaints against AI plans X

News press release

Rich program international summit on Artificial Intelligence

News press release

'Positivity about AI is so much more important than fears and negativity about it'

News

Het interactieve kunstwerk van Refik AnadolOpent extern (1) tijdens de Opening Academisch Jaar 2022-2023 was voor veel toeschouwers een hoogtepunt. We vroegen Refik na...

Artificial Intelligence plays an increasing role in our lives: "That tech companies decide on public values is dangerous"

News press release

Erasmus University Rotterdam