Apple reportedly ignored engineers’ warnings about its faulty Apple Intelligence and went on to release the technology, earning underwhelming market reviews.
The iPhone maker’s Apple Intelligence made headlines for the wrong reasons and garnered widespread criticism, particularly for making up false information and botching news headlines.
While it is common for large language models to hallucinate, a challenge that the AI industry is yet to solve if ever at all, Apple’s case was a bit different. Engineers noted some deep flaws with the model before it was released. Yet, the company ignored the warnings.
According to analysts, proceeding to release the model was reckless on the part of the tech giant after warnings were sounded about its AI’s gaping deficiencies. The warnings came through a study released in October last year.
Although it is yet to be peer-reviewed, the study also concluded that the models do not reason, after testing the mathematical “reasoning” of some of the top LLMs in the AI industry.
To test the models, the engineers made them solve thousands of math problems from the widely used GSM8K dataset, which is the industry’s benchmark.
According to Futurism, a typical question from the dataset reads: “James buys 5 packs of beef that are 4 pounds each. The price of beef is $5.50 per pound. How much did he pay?”
The engineers exposed the gaps in the AI models by simply changing some numbers in the questions to avoid data contamination. This caused some small but notable inaccuracies in the 20 LLMs examined.
However, when the researchers took it a step further by changing names and adding some “irrelevant details,” the results were “catastrophic,” reaching as high as 65%.
The researchers wrote: “This reveals a critical flaw in the models’ ability to discern relevant information for problem-solving, likely because their reasoning is not formal in the common sense term and is mostly based on pattern matching.”
According to the researchers, the results differed with each model. OpenAI’s 01-preview dropped by 17.5%, and its predecessor GPT-4o dropped by 32%. The researchers noted that even the “cleverest” models faced problems and exhibited some serious flaws.
The tests also proved that AI models may seem smart at solving problems, but once they are not copying someone’s homework word-for-word, they struggle. As for Apple, the tech giant reportedly knew about the test results but released its model to the market.
It took serious backlash from the market, including BBC raising concerns about the model dishing out misleading information, for Apple to eventually pause the program until it could fix it.
Apple’s AI feature was expected to summarize news notifications, but it sometimes fabricated news on its own, much to the displeasure of readers and news publishers.
One of the inaccurate news alerts alleged that Rafael Nadal was gay and that a man accused of killing a US insurance boss had shot himself. Apple’s AI also inaccurately summarized BBC app notifications to claim that Luke Littler had won the PDC World Darts Championship hours before it began.
Another incorrect summary of a New York Times story appears to have been published on January 6, relating to the fourth anniversary of the Capitol Hill riots.
“Apple Intelligence features are in beta and we are continuously making improvements with the help of user feedback,” Apple said in a statement to the BBC. “A software update in the coming weeks will further clarify when the text being displayed is summarization provided by Apple Intelligence. We encourage users to report a concern if they view an unexpected notification summary,” the statement added.
Apple’s faulty feature was one of the AI tools released to users of some newer iPhones in December, including the iPhone 16, 15 Pro, and 15 Pro Max handsets, as well as on some iPads and Macs.
A Step-By-Step System To Launching Your Web3 Career and Landing High-Paying Crypto Jobs in 90 Days.