New Anthropic research: Alignment faking in large language models

8 points by casslin 6 months ago | 0 comments