PortalsOS

a16z PodcastIs AI Slowing Down? Nathan Lab...

GPT-4.5 achieved a 65% score on the Simple QA benchmark, a significant leap from the 50% scored by the 03 models. This benchmark measures knowledge of esoteric facts, highlighting the model's improved factual knowledge.

Vote to see vote counts

At Waymark, we're experimenting with reinforcement fine-tuning on open-source models like Quinn. Despite potential improvements, we might still opt for commercial models like GPT-5 for ease of operation and upgrades.

Recent advancements in AI have led to an IMO gold medal being achieved with pure reasoning models, without access to external tools. This marks a significant leap from what GPT-4 could accomplish in mathematics, highlighting the rapid progression in AI capabilities.

a16z PodcastIs AI Slowing Down? Nathan Lab...

Nathan Labenz discusses the challenges faced during the launch of GPT-5, highlighting that the initial technical issues led to a poor first impression. The model router was broken, causing all queries to default to a less capable model, which contributed to negative perceptions.

Nathan Labenz argues that while AI might be perceived as slowing down, the leap from GPT-4 to GPT-5 is significant, similar to the leap from GPT-3 to GPT-4. He believes that the perception of stagnation is due to the incremental releases between major versions, which may have dulled the impact of the advancements.

a16z PodcastColumbia CS Professor: Why LLM...

Vishal Misra reflects on the pace of development in LLMs, noting that GPT-3 was a nice parlor trick, but with advancements like chat GPT and GPT-4, the technology has become polished and much more capable.

PortalsOS

Related Posts