Alignment faking in large language models

Article URL: https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models

Comments URL: https://news.ycombinator.com/item?id=42733593

Points: 22

# Comments: 2

https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models

Creato 12d | 19 gen 2025, 15:30:09

Accedi per aggiungere un commento

Altri post in questo gruppo

The Tensor Cookbook

The Tensor Cookbook

Article URL: https://tensorcookbook.com/

Comments URL: https://news.ycombinator.com/item?id

31 gen 2025, 21:40:14 | Hacker news

Meta in talks to reincorporate in Texas or another state, WSJ reports

Meta in talks to reincorporate in Texas or another state, WSJ reports

Article URL: https://www.reuters.com/technology/meta-talks-reinco

31 gen 2025, 21:40:14 | Hacker news

Bypass DeepSeek censorship by speaking in hex

Bypass DeepSeek censorship by speaking in hex

Article URL: https://substack.com/home/post/p-156004330

Comments URL: https:

31 gen 2025, 21:40:13 | Hacker news

Instagram and Facebook Blocked and Hid Abortion Pill Providers' Posts

Instagram and Facebook Blocked and Hid Abortion Pill Providers' Posts

Article URL: https://www.nytimes.com/2025/01/23/technology/instagram-facebook-abortion-pill-

31 gen 2025, 21:40:12 | Hacker news

Elite on the 6502: The original 6502 assembly source, heavily commented

Elite on the 6502: The original 6502 assembly source, heavily commented

Article URL: https://elite.bbcelite.com/

Comments URL: https://news.ycombinator.com/item?id

31 gen 2025, 21:40:11 | Hacker news

US government agency argues that money isn't property–so it can take yours

US government agency argues that money isn't property–so it can take yours

Article URL: https://reason.com/2025/01/31/the-government-says-money-isnt-property-so-it-can-t

31 gen 2025, 21:40:09 | Hacker news

GenAI Art Is the Least Imaginative Use of AI Imaginable

GenAI Art Is the Least Imaginative Use of AI Imaginable

Article URL: https://hai.stanford.edu/news/ge-wang-genai-art-least-imaginative-use-ai-imaginable

31 gen 2025, 21:40:09 | Hacker news

Techie