If the tech industry’s top AI models had superlatives, Microsoft-backed OpenAI’s GPT-4 would be best at math, Meta’s Llama 2 would be most middle of the road, Anthropic’s Claude 2 would be best at knowing its limits and Cohere AI would receive the title of most hallucinations — and most confident wrong answers.
That’s all according to a Thursday report from researchers at Arthur AI, a machine learning monitoring platform.
The research comes at a time when misinformation stemming from artificial intelligence systems is more hotly debated than ever, amid a boom in generative AI ahead of the 2024 U.S. presidential election.
It’s the first report “to take a comprehensive look at rates of hallucination, rather than just sort of ... provide a single number that talks about where they are on an LLM leaderboard,” Adam Wenchel, co-founder and CEO of Arthur, told CNBC.