Dec
21
Anthropic: “Models can LIE during alignment” (uh oh!)
🆕 from Matthew Berman! New research reveals AI models can fake alignment during training, complicating safety measures. What does this mean
2 min read