Vote to see vote counts
Nathan Labenz challenges the idea that AI progress is flatlining, arguing that the perception of diminishing returns is misleading. He believes that the advancements between GPT-4 and GPT-5 are substantial, but the frequent updates have made it harder for people to recognize the scale of progress.
The cost of AI models is dropping dramatically, with GPT-5 being 95% cheaper than GPT-4, suggesting continued price reductions.
The capability overhang in AI is immense, with many people unaware of the full potential of models like Codex compared to ChatGPT.
Despite efforts to code rules into AI models, unexpected outcomes still occur. At OpenAI, they expose AI to training examples to guide responses, but if a user alters their wording slightly, the AI might deviate from expected responses, acting in ways no human chose.
Open source AI models like GPT-OSS are beneficial, but there's a risk of losing control over their interpretation.
OpenAI's update of GPT-4.0 went overboard on flattery, showing that AI doesn't always follow system prompts. This isn't like a toaster or an obedient genie; it's something weirder and more alien.
Nathan Labenz discusses the challenges faced during the launch of GPT-5, highlighting that the initial technical issues led to a poor first impression. The model router was broken, causing all queries to default to a less capable model, which contributed to negative perceptions.
OpenAI's decision to focus on smaller models rather than scaling up is driven by the faster progress seen in post-training and reasoning paradigms, rather than just increasing model size.
Nathan Labenz argues that while AI might be perceived as slowing down, the leap from GPT-4 to GPT-5 is significant, similar to the leap from GPT-3 to GPT-4. He believes that the perception of stagnation is due to the incremental releases between major versions, which may have dulled the impact of the advancements.
GPT-4.5 achieved a 65% score on the Simple QA benchmark, a significant leap from the 50% scored by the 03 models. This benchmark measures knowledge of esoteric facts, highlighting the model's improved factual knowledge.