Alignment faking in large language models

Article URL: https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models

Comments URL: https://news.ycombinator.com/item?id=42733593

Points: 22

# Comments: 2

https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models

Created 1mo | Jan 19, 2025, 3:30:09 PM

Login to add comment

Other posts in this group

In Defense of Text Labels

In Defense of Text Labels

Article URL: https://www.chrbutler.com/in-defense-of-text-labels

Comments URL:

Feb 23, 2025, 3:10:24 AM | Hacker news

Vine: A programming language based on Interaction Nets

Vine: A programming language based on Interaction Nets

Article URL: https://vine.dev/

Comments URL: https://news.ycombinator.com/item?id=43144040

<p

Feb 23, 2025, 3:10:23 AM | Hacker news

Ask for no, don't ask for yes (2022)

Ask for no, don't ask for yes (2022)

Article URL: https://www.mooreds.com/wordpress/archives/3518

Comments URL:

Feb 23, 2025, 3:10:21 AM | Hacker news

Python's official documentation contains textbook example of insecure code (XSS)

Python's official documentation contains textbook example of insecure code (XSS)

Article URL: https://seclists.org/fulldisclosure/2025/Feb/15

Comments URL:

Feb 23, 2025, 3:10:18 AM | Hacker news

Penn to reduce graduate admissions, rescind acceptances amid research cuts

Penn to reduce graduate admissions, rescind acceptances amid research cuts

Article URL: https://www.thedp.com/article/2025/02/penn-graduate-student-class-size-cut-trump-fun

Feb 23, 2025, 3:10:14 AM | Hacker news

Digital Services Playbook

Digital Services Playbook

Article URL: https://playbook.usds.gov/

Comments URL: https://news.ycombinator.com/item?id=4

Feb 23, 2025, 3:10:14 AM | Hacker news

Meta slashes staff stock awards as group embarks on AI spending drive

Meta slashes staff stock awards as group embarks on AI spending drive

Article URL: https://www.ft.com/content/67a4c030-a7f6-47af-bab0-a998f0a09506

Comments URL:

Feb 23, 2025, 3:10:12 AM | Hacker news

Techie