PortalsOS

a16z PodcastColumbia CS Professor: Why LLM...

Even with a trillion parameters, LLMs cannot represent the entire matrix of possible prompts and responses. They interpolate based on training data and new prompts to generate responses, acting more like Bayesian models than stochastic parrots.

Vote to see vote counts

The ultimate LLM will probably be about a billion parameters, indicating a future of more efficient AI models.

Moonshots with Peter Diam...Replit CEO on Vibe Coding and ...

The ultimate LLM will probably be about a billion parameters, indicating a future of more efficient AI models.

When a prompt is given to an LLM, it uses the context as new evidence to compute a Bayesian posterior distribution. This allows the model to generate likely responses even to prompts it has never encountered before.

Large Language Models (LLMs) create Bayesian manifolds during training. They confidently generate coherent outputs while traversing these manifolds, but veer into 'confident nonsense' when they stray from them.

Dwarkesh PodcastSome thoughts on the Sutton in...

Current LLMs do not develop true world models; they build models of what a human would say next, relying on human-derived concepts.

Dwarkesh PodcastRichard Sutton – Father of RL ...

LLMs are criticized for lacking a true world model because they predict human-like responses rather than actual outcomes.

The matrix abstraction model for LLMs involves a gigantic matrix where each row corresponds to a prompt, and columns represent the vocabulary tokens. Despite its size, this matrix is sparse, as many rows and columns are irrelevant or zero.

a16z PodcastColumbia CS Professor: Why LLM...

LLMs respond differently to prompts based on information entropy. High information entropy prompts with low prediction entropy lead to more precise outputs, as they reduce the realm of possibilities.

LLMs can perform few-shot learning by creating the right posterior distribution for tokens based on examples provided in a prompt. This process is the same whether it's in-context learning or just a continuation task.

At the core of LLMs, regardless of their complexity or training methods, is the creation of a distribution for the next token. Given a prompt, LLMs predict and select the next word from this distribution, continuing the process iteratively.

PortalsOS

Related Posts