Show HN: I Created ErisForge, a Python Library for Abliteration of LLMs

ErisForge is a Python library designed to modify Large Language Models (LLMs) by applying transformations to their internal layers. Named after Eris, the goddess of strife and discord, ErisForge allows you to alter model behavior in a controlled manner, creating both ablated and augmented versions of LLMs that respond differently to specific types of input.

It is also quite useful to perform studies on propaganda and bias in LLMs (planning to experiment with deepseek).

Features - Modify internal layers of LLMs to produce altered behaviors. - Ablate or enhance model responses with the AblationDecoderLayer and AdditionDecoderLayer classes. - Measure refusal expressions in model responses using the ExpressionRefusalScorer. - Supports custom behavior directions for applying specific types of transformations.


Comments URL: https://news.ycombinator.com/item?id=42842123

Points: 5

# Comments: 0

https://github.com/Tsadoq/ErisForge

Created 1d | Jan 27, 2025, 3:50:07 PM


Login to add comment

Other posts in this group

Show HN: Meelo, self-hosted music server for collectors and music maniacs

I've been working on this alternative for Plex for almost 3 years now. It's main selling point is that it correctly handles multiple versions of albums and songs. As of today, it only has a web cl

Jan 29, 2025, 2:40:07 AM | Hacker news
Goodbye, Slopify
Jan 29, 2025, 2:40:06 AM | Hacker news
Discovery Coding
Jan 29, 2025, 2:40:05 AM | Hacker news
Slicing the Fourth
Jan 29, 2025, 12:20:59 AM | Hacker news