Vote to see vote counts
When solving problems, LLMs benefit from a 'chain of thought' approach. By breaking down tasks into smaller, familiar steps, they reduce prediction entropy and increase confidence in the final answer.
Even with a trillion parameters, LLMs cannot represent the entire matrix of possible prompts and responses. They interpolate based on training data and new prompts to generate responses, acting more like Bayesian models than stochastic parrots.
LLMs are criticized for lacking a true world model because they predict human responses rather than actual events.
When a prompt is given to an LLM, it uses the context as new evidence to compute a Bayesian posterior distribution. This allows the model to generate likely responses even to prompts it has never encountered before.
LLMs develop deep representations of the world due to their training process, which incentivizes them to do so.
LLMs are criticized for lacking a true world model because they predict human-like responses rather than actual outcomes.
LLMs are trained on vast amounts of human data, which is an inelastic and hard-to-scale resource, making it an inefficient use of compute.
Recursive self-improvement in LLMs is not possible without additional information. Even with multiple LLMs interacting, they can't generate new information beyond their training set.
LLMs can perform few-shot learning by creating the right posterior distribution for tokens based on examples provided in a prompt. This process is the same whether it's in-context learning or just a continuation task.
At the core of LLMs, regardless of their complexity or training methods, is the creation of a distribution for the next token. Given a prompt, LLMs predict and select the next word from this distribution, continuing the process iteratively.