On Tuesday, researchers from Stanford College and College of California, Berkeley revealed a analysis paper that purports to indicate adjustments in GPT-4’s outputs over time. The paper fuels a common-but-unproven perception that the AI language mannequin has grown worse at coding and compositional duties over the previous few months. Some consultants aren’t satisfied by the outcomes, however they are saying that the dearth of certainty factors to a bigger drawback with how OpenAI handles its mannequin releases.
In a research titled “How Is ChatGPT’s Habits Altering over Time?” revealed on arXiv, Lingjiao Chen, Matei Zaharia, and James Zou, forged doubt on the constant efficiency of OpenAI’s massive language fashions (LLMs), particularly GPT-3.5 and GPT-4. Utilizing API entry, they examined the March and June 2023 variations of those fashions on duties like math problem-solving, answering delicate questions, code technology, and visible reasoning. Most notably, GPT-4’s means to determine prime numbers reportedly plunged dramatically from an accuracy of 97.6 % in March to only 2.Four % in June. Unusually, GPT-3.5 confirmed improved efficiency in the identical interval.
This research comes on the heels of individuals often complaining that GPT-Four has subjectively declined in efficiency over the previous few months. Widespread theories about why embrace OpenAI “distilling” fashions to scale back their computational overhead in a quest to hurry up the output and save GPU sources, fine-tuning (further coaching) to scale back dangerous outputs which will have unintended results, and a smattering of unsupported conspiracy theories equivalent to OpenAI lowering GPT-4’s coding capabilities so extra folks pays for GitHub Copilot.
Learn 14 remaining paragraphs | Feedback