VibesWire

Ctrl + K
Log In
LLMs Store 'How to Be Harmful' in a Small, Unified Set of Weights — Separate from 'Knowing What's Harmful.' Alignment Compresses It, Which Explains Why Fine-Tuning Breaks Safety So Easily. | VibesWire