WisentAI
·10 min read·Wisent Research Team

Control Vectors: The Math Behind AI Personality

A technical exploration of how control vectors work mathematically and how they enable fine-grained control over AI behavior.

TechnicalMathematicsResearch

Control vectors are the mathematical heart of Wisent's personality system. This post explores the technical details of how they work.

Mathematical Foundation

At their core, control vectors are directions in high-dimensional activation space. When a transformer processes text, each layer produces a hidden state - a vector of typically thousands of dimensions. These hidden states encode everything the model "knows" about the text so far.

The Geometry of Concepts

Research has shown that neural networks often represent concepts as directions in their activation space. For instance, there might be a direction that corresponds to "formal vs. informal" or "happy vs. sad."

Mathematically, if we have a hidden state **h**, adding a control vector **v** scaled by coefficient **α** gives us:

h' = h + α * v

This simple operation shifts the model's internal state along the direction encoded by **v**.

Extracting Control Vectors

The process of extracting control vectors involves several steps:

1. Contrastive Dataset Creation

We create pairs of examples that differ along the dimension we want to control. For a "creativity" vector:

  • **High creativity**: "Write a story about a magical forest..."
  • **Low creativity**: "Write a factual description of a forest..."
  • 2. Activation Collection

    We run both sets through the model, collecting hidden state activations at each layer. For a model with L layers and hidden dimension D, we get:

    A_high ∈ R^(N × L × D)

    A_low ∈ R^(N × L × D)

    Where N is the number of examples.

    3. Difference Computation

    We compute the mean difference between high and low activations:

    Δ = mean(A_high) - mean(A_low)

    4. Dimensionality Reduction

    Often, we apply PCA to find the principal direction of variation:

    v = PCA(Δ, components=1)

    This gives us a single vector that best captures the concept.

    Applying Control Vectors

    During inference, we modify the forward pass to inject our control vector:

    def modified_forward(x, control_vector, strength):

    for layer in model.layers:

    h = layer(x)

    h = h + strength * control_vector[layer.index]

    x = h

    return x

    The strength parameter allows us to dial the effect up or down, or even reverse it by using negative values.

    Composing Multiple Vectors

    One powerful aspect of control vectors is composability. Multiple vectors can be combined:

    h' = h + α₁*v₁ + α₂*v₂ + α₃*v₃

    This allows creating complex personalities by combining traits like:

  • High creativity (v₁, α₁ = 0.8)
  • High empathy (v₂, α₂ = 0.6)
  • Low formality (v₃, α₃ = -0.3)
  • Practical Considerations

    Layer Selection

    Not all layers are equally effective for control. We've found that:

  • Early layers (1-8): Affect low-level linguistic patterns
  • Middle layers (9-20): Best for personality and style
  • Late layers (21+): Affect factual content and reasoning
  • Strength Calibration

    Too much strength can destabilize outputs. We typically use strengths between -1.5 and 1.5, with most effective ranges between -0.8 and 0.8.

    Vector Normalization

    Normalizing vectors to unit length ensures consistent effect sizes across different control dimensions.

    Open Research Questions

    Several questions remain active areas of research:

  • **Orthogonalization**: How do we ensure control vectors don't interfere with each other?
  • **Transfer**: Do control vectors generalize across model sizes and architectures?
  • **Emergence**: How do these directions emerge during training?
  • **Composition limits**: How many vectors can be effectively combined?
  • Conclusion

    Control vectors provide a mathematically elegant way to shape AI behavior. By understanding and manipulating the geometry of neural activations, we can achieve fine-grained control that was previously impossible.

    At Wisent, we've built our entire character system on this foundation, enabling users to create AI personalities with unprecedented precision and consistency.

    Ready to Experience AI Characters?

    See representation engineering in action with Wisent.

    Try Wisent Free