Vote to see vote counts
When solving problems, LLMs benefit from a 'chain of thought' approach. By breaking down tasks into smaller, familiar steps, they reduce prediction entropy and increase confidence in the final answer.
Even with a trillion parameters, LLMs cannot represent the entire matrix of possible prompts and responses. They interpolate based on training data and new prompts to generate responses, acting more like Bayesian models than stochastic parrots.
When a prompt is given to an LLM, it uses the context as new evidence to compute a Bayesian posterior distribution. This allows the model to generate likely responses even to prompts it has never encountered before.
The matrix abstraction model for LLMs involves a gigantic matrix where each row corresponds to a prompt, and columns represent the vocabulary tokens. Despite its size, this matrix is sparse, as many rows and columns are irrelevant or zero.
LLMs can perform few-shot learning by creating the right posterior distribution for tokens based on examples provided in a prompt. This process is the same whether it's in-context learning or just a continuation task.