Control Vectors: The Math Behind AI Personality

Control vectors are the mathematical heart of Wisent's personality system. This post explores the technical details of how they work.

Mathematical Foundation

At their core, control vectors are directions in high-dimensional activation space. When a transformer processes text, each layer produces a hidden state - a vector of typically thousands of dimensions. These hidden states encode everything the model "knows" about the text so far.

The Geometry of Concepts

Research has shown that neural networks often represent concepts as directions in their activation space. For instance, there might be a direction that corresponds to "formal vs. informal" or "happy vs. sad."

Mathematically, if we have a hidden state **h**, adding a control vector **v** scaled by coefficient **α** gives us:

h' = h + α * v

This simple operation shifts the model's internal state along the direction encoded by **v**.

Extracting Control Vectors

The process of extracting control vectors involves several steps:

1. Contrastive Dataset Creation

We create pairs of examples that differ along the dimension we want to control. For a "creativity" vector:

**High creativity**: "Write a story about a magical forest..."

**Low creativity**: "Write a factual description of a forest..."

2. Activation Collection

We run both sets through the model, collecting hidden state activations at each layer. For a model with L layers and hidden dimension D, we get:

A_high ∈ R^(N × L × D)

A_low ∈ R^(N × L × D)

Where N is the number of examples.

3. Difference Computation

We compute the mean difference between high and low activations:

Δ = mean(A_high) - mean(A_low)

4. Dimensionality Reduction

Often, we apply PCA to find the principal direction of variation:

v = PCA(Δ, components=1)

This gives us a single vector that best captures the concept.

Applying Control Vectors

During inference, we modify the forward pass to inject our control vector:

def modified_forward(x, control_vector, strength):

for layer in model.layers:

h = layer(x)

h = h + strength * control_vector[layer.index]

x = h

return x

The strength parameter allows us to dial the effect up or down, or even reverse it by using negative values.

Composing Multiple Vectors

One powerful aspect of control vectors is composability. Multiple vectors can be combined:

h' = h + α₁*v₁ + α₂*v₂ + α₃*v₃

This allows creating complex personalities by combining traits like:

High creativity (v₁, α₁ = 0.8)

High empathy (v₂, α₂ = 0.6)

Low formality (v₃, α₃ = -0.3)

Practical Considerations

Layer Selection

Not all layers are equally effective for control. We've found that:

Early layers (1-8): Affect low-level linguistic patterns

Middle layers (9-20): Best for personality and style

Late layers (21+): Affect factual content and reasoning

Strength Calibration

Too much strength can destabilize outputs. We typically use strengths between -1.5 and 1.5, with most effective ranges between -0.8 and 0.8.

Vector Normalization

Normalizing vectors to unit length ensures consistent effect sizes across different control dimensions.

Open Research Questions

Several questions remain active areas of research:

**Orthogonalization**: How do we ensure control vectors don't interfere with each other?

**Transfer**: Do control vectors generalize across model sizes and architectures?

**Emergence**: How do these directions emerge during training?

**Composition limits**: How many vectors can be effectively combined?

Conclusion

Control vectors provide a mathematically elegant way to shape AI behavior. By understanding and manipulating the geometry of neural activations, we can achieve fine-grained control that was previously impossible.

At Wisent, we've built our entire character system on this foundation, enabling users to create AI personalities with unprecedented precision and consistency.