
News
Anthropic’s NLA Breakthrough: We Can Finally Read AI’s Thoughts—But They Might Be Lying
Natural Language Autoencoders (NLAs) are a new technique from Anthropic that converts a model’s internal neural activations into plain English—and back again. In practice, they act like live subtitles for an AI’s hidden reasoning process.