
Apple artificial intelligence (AI) has determined that language models from leading companies including Meta and OpenAI grapple with basic reasoning. These models are usually used in chatbots and a few apps.
Recently, a team of researchers have conducted a study called GSM-Symbolic, to test the efficiency level of these models and assist their performance. This experiment will exhibit that even a minute alteration to the question can lead to generating a completely different response.
To test these leading AI chatbots in mathematical reasoning, the researchers integrated additional information to their questions to reduce the complexity and make it comprehensible for humans.
Read more: Gmail security alert — AI-driven attacks pose risk to 2.5bn accounts
The researchers reported: "Specifically, the performance of all models declines [even] when only the numerical values in the question are altered in the GSM-Symbolic benchmark. Furthermore, the fragility of mathematical reasoning in these models [demonstrates] that their performance significantly deteriorates as the number of clauses in a question increases."
According to the study, even adding a single sentence while writing a prompt to resolve any mathematical query, even if it looks helpful, can decrease its accuracy to 65%.
To test its accuracy, researchers asked a very basic word problem the AI chatbot.
Prompt: "Oliver picks 44 Kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday."
Both the OpenAI and Meta’s Llama3-8b chatbot subtracted the word “smaller kiwis” from the total, which was affected by a response by sacrificing its accuracy.