K-Steering How It Works

This page covers installation, a quickstart walkthrough, and guidance for running larger models.

Installation

Clone the Repository

git clone https://github.com/withmartian/k-steering.git

Prerequisites

  • Python 3.12 or higher
  • uv: Fast Python package installer and resolver

To install uv, follow the Astral installation guide.

Install Dependencies

For now, we recommend running K-Steering locally from the repository root:

uv sync

This creates the environment and installs the required dependencies.

Quick Start

Try it in Google Colab

You can explore K-Steering without any local setup using the Colab notebook.

Includes installation, training, and inference examples.

API Usage

See the examples/ directory for complete scripts for training different steering models.

K-Steering (Non-Linear Steering)

This example shows how to use K-Steering to guide a language model's behavior by training lightweight steering classifiers and applying them during inference.

1. Load Required Modules

from k_steering.steering.config import SteeringConfig
from k_steering.steering.k_steer import KSteering

2. Select a Base Model

# Hugging Face model to be steered
MODEL_NAME = "unsloth/Llama-3.2-1B-Instruct"

3. Configure Steering

Define which layers are used to train and apply steering.

steering_config = SteeringConfig(
    train_layer=1,          # Layer used to train the steering classifier
    steer_layers=[1, 3],    # Layers where steering is applied
)

4. Task and Generation Settings

TASK_NAME = "debates"       # e.g. "debates" or "tones"
MAX_NEW_TOKENS = 100        # Maximum number of tokens to generate
MAX_SAMPLES = 10            # Maximum number of samples for training

GENERATION_KWARGS = {
    "max_new_tokens": MAX_NEW_TOKENS,
    "temperature": 1.0,
    "top_p": 0.9,
}

5. Initialize K-Steering

Wrap the base model with K-Steering.

steer_model = KSteering(
    model_name=MODEL_NAME,
    steering_config=steering_config,
)

6. Train Steering Classifiers

Train steering classifiers on task-specific data. Remove max_samples to use the full dataset.

steer_model.fit(
    task=TASK_NAME,
    max_samples=MAX_SAMPLES,
)

7. Generate Steered Outputs

prompts = [
    "Are political ideologies evolving in response to global challenges?"
]

output = steer_model.get_steered_output(
    prompts,
    target_labels=["Empirical Grounding"],     # Behaviors to encourage
    avoid_labels=["Straw Man Reframing"],      # Behaviors to suppress
    generation_kwargs=GENERATION_KWARGS,
)

print(output)

Large Model Setup

The table below provides approximate GPU memory requirements for transformer models at different parameter scales. Use it to estimate what can run on free Colab versus what needs larger hardware.

Model SizeParamsFP16 VRAM (Inference)4-bit VRAM (Inference)Recommended GPUColab Free Feasible?
Tiny100M-300M~0.5-1 GB~0.3-0.5 GBAny GPUYes
Small500M-1B~2-3 GB~1-1.5 GBT4 / L4Yes
Medium2B-3B~5-7 GB~2-3 GBT4 (tight) / L4No
Upper-Mid7B~14-16 GB~4-6 GBL4 / A100No
Large13B~26-28 GB~8-10 GBA100 40GBNo
Very Large30B~60+ GB~18-22 GBMulti-GPUNo
Frontier70B~140+ GB~35-40 GBMulti A100/H100No