Nous Research Releases Contrastive Neuron Attribution (CNA): Steering LLMs Without SAE Training

Nous Research has released Contrastive Neuron Attribution (CNA), a technique designed to steer LLM behavior by precisely identifying and ablating sparse Multi-Layer Perceptron (MLP) neuron circuits. A key innovation of CNA is that it achieves this steering without the need for sparse autoencoder training, weight modification, or degradation in general capability benchmarks, offering a streamlined approach to fine-tuning and control of LLMs.

Nous Research Releases Contrastive Neuron Attribution (CNA): Steering LLMs Without SAE Training

More from this section

Nous Research Releases Contrastive Neuron Attribution (CNA): Steering LLMs Without SAE Training

More from this section