Alignment faking in large language models

Article URL: https://www.anthropic.com/research/alignment-faking

Comments URL: https://news.ycombinator.com/item?id=42458752

Points: 63

# Comments: 35

https://www.anthropic.com/research/alignment-faking

Établi 1mo | 19 déc. 2024 à 08:10:05

Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

The Simplicity of Prolog

The Simplicity of Prolog

Article URL: https://bitsandtheorems.com/the-simplicity-of-prolog/

Comments URL:

26 janv. 2025 à 05:10:09 | Hacker news

Emerging Reasoning with Reinforcement Learning

Emerging Reasoning with Reinforcement Learning

https://github.com/hkust-nlp/simpleRL-reason

Comments URL: https://ne

26 janv. 2025 à 05:10:07 | Hacker news

AI Slop, Suspicion, and Writing Back

AI Slop, Suspicion, and Writing Back

Article URL: https://benjamincongdon.me/blog/2025/01/25/AI-Slop-Suspicion-and-Writing-Back/

Comm

26 janv. 2025 à 05:10:07 | Hacker news

Ask HN: Anyone else find LLM related posts causing them to lose interest in HN

Ask HN: Anyone else find LLM related posts causing them to lose interest in HN

Loved this site for a long time, always felt like I could come to learn something interesting and find niche topics and experts.

Feels like posts have overwhelmingly been on LLMs for the past fe

26 janv. 2025 à 05:10:05 | Hacker news

The South Vietnamese pilot who landed a Cessna on a carrier to save his family (2019)

The South Vietnamese pilot who landed a Cessna on a carrier to save his family (2019)

Article URL: https://www.historynet.com/maj-buang-lys-daring-feat-to-save-his-family/

Comments URL:

26 janv. 2025 à 02:40:12 | Hacker news

Chimera Linux works toward a simplified desktop

Chimera Linux works toward a simplified desktop

Article URL: https://lwn.net/Articles/1004324/

Comments URL: https://news.ycombinator

26 janv. 2025 à 02:40:09 | Hacker news

Sony ends production of Blu-ray Disc, recordable MiniDisc, and MiniDV media

Sony ends production of Blu-ray Disc, recordable MiniDisc, and MiniDV media

Article URL: https://www.sony.jp/rec-media/info2/20250123.html

Comments URL:

26 janv. 2025 à 00:30:07 | Hacker news

Techie