PortalsOS

a16z PodcastColumbia CS Professor: Why LLM...

At the core of LLMs, regardless of their complexity or training methods, is the creation of a distribution for the next token. Given a prompt, LLMs predict and select the next word from this distribution, continuing the process iteratively.

Vote to see vote counts

When solving problems, LLMs benefit from a 'chain of thought' approach. By breaking down tasks into smaller, familiar steps, they reduce prediction entropy and increase confidence in the final answer.

a16z PodcastColumbia CS Professor: Why LLM...

Even with a trillion parameters, LLMs cannot represent the entire matrix of possible prompts and responses. They interpolate based on training data and new prompts to generate responses, acting more like Bayesian models than stochastic parrots.

When a prompt is given to an LLM, it uses the context as new evidence to compute a Bayesian posterior distribution. This allows the model to generate likely responses even to prompts it has never encountered before.

The matrix abstraction model for LLMs involves a gigantic matrix where each row corresponds to a prompt, and columns represent the vocabulary tokens. Despite its size, this matrix is sparse, as many rows and columns are irrelevant or zero.

LLMs can perform few-shot learning by creating the right posterior distribution for tokens based on examples provided in a prompt. This process is the same whether it's in-context learning or just a continuation task.

PortalsOS

Related Posts