PortalsOS

Related Posts

Vote to see vote counts

The ultimate LLM will probably be about a billion parameters, indicating a future of more efficient AI models.

Podcast artwork
Moonshots with Peter Diam...Replit CEO on Vibe Coding and ...

The ultimate LLM will probably be about a billion parameters, indicating a future of more efficient AI models.

When a prompt is given to an LLM, it uses the context as new evidence to compute a Bayesian posterior distribution. This allows the model to generate likely responses even to prompts it has never encountered before.

Large Language Models (LLMs) create Bayesian manifolds during training. They confidently generate coherent outputs while traversing these manifolds, but veer into 'confident nonsense' when they stray from them.

Podcast artwork
Dwarkesh PodcastSome thoughts on the Sutton in...

Current LLMs do not develop true world models; they build models of what a human would say next, relying on human-derived concepts.

Podcast artwork
Dwarkesh PodcastRichard Sutton – Father of RL ...

LLMs are criticized for lacking a true world model because they predict human-like responses rather than actual outcomes.

The matrix abstraction model for LLMs involves a gigantic matrix where each row corresponds to a prompt, and columns represent the vocabulary tokens. Despite its size, this matrix is sparse, as many rows and columns are irrelevant or zero.

Podcast artwork
a16z PodcastColumbia CS Professor: Why LLM...

LLMs respond differently to prompts based on information entropy. High information entropy prompts with low prediction entropy lead to more precise outputs, as they reduce the realm of possibilities.

LLMs can perform few-shot learning by creating the right posterior distribution for tokens based on examples provided in a prompt. This process is the same whether it's in-context learning or just a continuation task.

At the core of LLMs, regardless of their complexity or training methods, is the creation of a distribution for the next token. Given a prompt, LLMs predict and select the next word from this distribution, continuing the process iteratively.