Nous Research introduces Contrastive Neuron Attribution (CNA), a novel method for steering Large Language Model (LLM) behavior by identifying and ablating sparse MLP neuron circuits without requiring sparse autoencoder training or weight modification.
Nous Research has released Contrastive Neuron Attribution (CNA), a technique designed to steer LLM behavior by precisely identifying and ablating sparse Multi-Layer Perceptron (MLP) neuron circuits. A key innovation of CNA is that it achieves this steering without the need for sparse autoencoder training, weight modification, or degradation in general capability benchmarks, offering a streamlined approach to fine-tuning and control of LLMs.