How Claude Thinks

Mar 28, 2025

Since neural nets are “trained” rather than programmed, and since the training consists of setting billions of parameters, it’s generally considered virtually impossible to explain on any level higher than the detailed functioning of the underlying neural net how a Large Language Model generates the words it produces. Anthropic, the company that built the Claude family of LLMs, has done more work on developing a conceptual model of how its LLMs work than any other organization.

The brief video above introduces its latest results. The notes accompanying the video link to a post that introduces two new research papers. Although Anthropic researchers have not uncovered a secret method to understand Claude models, the results presented in the papers have taken a significant step toward that goal.

Russ’s Substack

Discussion about this post