News
Researchers at the company looked into how malicious fine-tuning makes a model go rogue, and how to turn it back.
OpenAI researchers say they've discovered hidden features inside AI models that correspond to misaligned "personas," ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results