K-Steering

K-Steering is a lightweight and flexible toolkit for steering large language model outputs with minimal overhead and maximum control.

K-Steering enables controlled generation from language models by applying steering vectors at inference time, supporting parameter sweeps to identify optimal steering strengths, and integrating with Hugging Face and local datasets.

Whether you're experimenting with interpretability, alignment, or controllable generation, K-Steering is designed to keep the workflow simple and modular.

Getting Started

Start with the GitHub repository for installation details and examples, or jump directly into the conceptual guides here.

GitHub Repository

How It Works

Core Concepts

Key Features

Inference-Time Steering: Apply steering vectors at specific layers without fine-tuning the base model.
Non-Linear Steering: Compose and apply steering vectors across different layers using learned classifiers.
Parameter Sweeps: Search for optimal steering strengths using built-in evaluation utilities.
Flexible Datasets: Use predefined tasks, Hugging Face datasets, or local CSV, JSON, and DataFrame sources.
Reproducible Experiments: Use SteeringConfig to keep training and evaluation runs consistent.

Documentation

Core Concepts explains the steering model, dataset format, and core API surface.
How It Works covers installation, quickstart usage, and practical hardware guidance.