Vote to see vote counts
Vishal Misra's work on understanding LLMs is profound. He has developed models that reduce the complex, multidimensional space of LLMs into a geometric manifold, allowing us to predict where reasoning can move within that space. This approach reflects how humans simplify the complex universe into manageable forms for reasoning.
Even with a trillion parameters, LLMs cannot represent the entire matrix of possible prompts and responses. They interpolate based on training data and new prompts to generate responses, acting more like Bayesian models than stochastic parrots.
When a prompt is given to an LLM, it uses the context as new evidence to compute a Bayesian posterior distribution. This allows the model to generate likely responses even to prompts it has never encountered before.
The ultimate LLM (Large Language Model) will likely be about a billion parameters, showing the efficiency of compressing human knowledge into small models.
Large Language Models (LLMs) create Bayesian manifolds during training. They confidently generate coherent outputs while traversing these manifolds, but veer into 'confident nonsense' when they stray from them.
Current LLMs do not develop true world models; they build models of what a human would say next, relying on human-derived concepts.
LLMs can perform few-shot learning by creating the right posterior distribution for tokens based on examples provided in a prompt. This process is the same whether it's in-context learning or just a continuation task.
At the core of LLMs, regardless of their complexity or training methods, is the creation of a distribution for the next token. Given a prompt, LLMs predict and select the next word from this distribution, continuing the process iteratively.
The most impactful models for understanding LLMs, according to Martin, are those created by Vishal Misra. His work, including a notable talk at MIT, explores not only how LLMs reason but also offers reflections on human reasoning.