Vote to see vote counts
There is a convergence happening among large language models (LLMs), with no sustainable technical advantage being developed, as AI can quickly reverse engineer other AIs.
Most of the compute used by large language models (LLMs) is during deployment rather than learning, which is inefficient as they only learn during a special training phase.
Even with a trillion parameters, LLMs cannot represent the entire matrix of possible prompts and responses. They interpolate based on training data and new prompts to generate responses, acting more like Bayesian models than stochastic parrots.
Large language models do not build a world model because they predict human responses rather than actual world events.
Large language models do not build a model of the world; they mimic what people say, which is not the same as predicting what will happen in the world.
The ultimate LLM (Large Language Model) will likely be about a billion parameters, showing the efficiency of compressing human knowledge into small models.
Current LLMs do not develop true world models; they build models of what a human would say next, relying on human-derived concepts.
The capabilities of large language models (LLMs) have improved but have not fundamentally changed. Like the iPhone, early iterations were groundbreaking, but recent developments have been incremental, focusing on improvements rather than breakthroughs.
Large language models are criticized for lacking a true world model because they predict human responses rather than actual world events.
The most impactful models for understanding LLMs, according to Martin, are those created by Vishal Misra. His work, including a notable talk at MIT, explores not only how LLMs reason but also offers reflections on human reasoning.