Advanced Features
Martian offers a range of features designed to improve efficiency, flexibility, and security when working with AI models.
Cost-Optimized Routing
Martian helps you reduce model usage costs through efficient routing strategies, without sacrificing accuracy or quality.
To access these reduced rates, append :cheap
to any model name for intelligent cost optimization. For example:
openai/gpt-4.1-nano:cheap
: Maintains GPT-4.1 nano quality at reduced costanthropic/claude-sonnet-4-20250514:cheap
: Cost-effective Claude 3.5 Sonnet alternative
For detailed pricing information, see our Available Models page.
import openai
client = openai.OpenAI(
base_url="https://api.withmartian.com/v1",
api_key=MARTIAN_API_KEY,
)
completion = client.chat.completions.create(
model="openai/gpt-5:cheap",
messages=[
{
"role": "user",
"content": "What are some ways to save money on a trip to Mars?"
}
]
)
Streaming Responses
To enable real-time response streaming, set stream=True
when you create the completions request.
OpenAI Python Example
import openai
oai_client = openai.OpenAI(
base_url="https://api.withmartian.com/v1",
api_key=MARTIAN_API_KEY
)
stream = oai_client.chat.completions.create(
model="openai/gpt-4.1-mini",
messages=[{"role": "user", "content": "What is Olympus Mons?"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Anthropic Python Example
import anthropic
anth_client = anthropic.Anthropic(
base_url="https://api.withmartian.com/v1",
api_key=MARTIAN_API_KEY
)
stream = anth_client.messages.create(
model="anthropic/claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a sonnet to Mars (the god, not the planet)."}],
stream=True
)
for chunk in stream:
if chunk.type == "content_block_delta" and chunk.delta.type == "text_delta":
print(chunk.delta.text, end="")
Caching
Martian supports Anthropic's message caching for POST /v1/messages
. This enables you to cache input tokens on the first call to the endpoint, which can then be reused in subsequent calls to reduce response times and inference costs.
To use caching, set a cache breakpoint with "cache_control": { "type": "ephemeral" }
after the content of a prompt block. In subsequent requests, the entire prompt up to cache_control
must be identical to the previously cached prompt.
Important: Anthropic will not cache fewer than 1024 tokens (or fewer than 2048 tokens for Haiku models).
See Anthropic's Prompt caching documentation for additional information on how caching works.
cURL Example
First Request - Sets up the cache:
curl https://api.withmartian.com/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MARTIAN_API_KEY" \
-d '{
"model": "anthropic/claude-opus-4-20250514",
"max_tokens": 100,
"system": [
{
"type": "text",
"text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on facts, specifications, and other information.\n"
},
{
"type": "text",
"text": "<the entire contents of Space! : The Universe As You have Never Seen It Before>",
"cache_control": { "type": "ephemeral" }
}
],
"messages": [
{
"role": "user",
"content": "What is the largest planet?"
}
]
}'
Response from First Request:
The response includes usage metadata showing cache creation:
{
"id": "msg_XXXxXxXXxXxxXX",
"type": "message",
"role": "assistant",
"model": "anthropic/claude-opus-4-20250514",
"content": [
{
"type": "text",
"text": "According to the information provided, Jupiter is the largest planet in our solar system..."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 27,
"cache_creation_input_tokens": 1279,
"cache_read_input_tokens": 0,
"cache_creation": {
"ephemeral_5m_input_tokens": 1279,
"ephemeral_1h_input_tokens": 0
},
"output_tokens": 45,
"service_tier": "standard"
}
}
Notice cache_creation_input_tokens: 1279
indicates that 1,279 tokens were cached successfully.
Second Request - Uses cached content:
# The system prompt must be identical to use the cache
curl https://api.withmartian.com/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MARTIAN_API_KEY" \
-d '{
"model": "anthropic/claude-opus-4-20250514",
"max_tokens": 100,
"system": [
{
"type": "text",
"text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on facts, specifications, and other information.\n"
},
{
"type": "text",
"text": "<the entire contents of Space! : The Universe As You have Never Seen It Before>",
"cache_control": { "type": "ephemeral" }
}
],
"messages": [
{
"role": "user",
"content": "Which planet is closest to the sun?"
}
]
}'
Response from Second Request:
The response shows cache reuse with cache_read_input_tokens
:
{
"id": "msg_YYYyYyYYyYyyYY",
"type": "message",
"role": "assistant",
"model": "anthropic/claude-opus-4-20250514",
"content": [
{
"type": "text",
"text": "Mercury is the planet closest to the Sun, orbiting at an average distance of..."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 21,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 1279,
"cache_creation": {
"ephemeral_5m_input_tokens": 0,
"ephemeral_1h_input_tokens": 0
},
"output_tokens": 42,
"service_tier": "standard"
}
}
Notice cache_read_input_tokens: 1279
indicates that the cached tokens were successfully reused, saving both time and cost.
Python Example
import anthropic
anth_client = anthropic.Anthropic(
base_url="https://api.withmartian.com/v1",
api_key=MARTIAN_API_KEY
)
# Define a common system instruction that will be used repeatedly
system_instruction = [
{
"type": "text",
"text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on facts, specifications, and other information.\n",
},
{
"type": "text",
"text": "<the entire contents of 'Space! : The Universe As You've Never Seen It Before'>",
"cache_control": {"type": "ephemeral"}
}
]
# First interaction - the system instruction will be cached
message1 = anth_client.messages.create(
model="anthropic/claude-opus-4-20250514",
max_tokens=100,
system=system_instruction,
messages=[{"role": "user", "content": "What is the largest planet?"}],
)
# Check the usage to confirm cache creation
print("First request usage:")
print(f" cache_creation_input_tokens: {message1.usage.cache_creation_input_tokens}")
print(f" cache_read_input_tokens: {message1.usage.cache_read_input_tokens}")
print(f" input_tokens: {message1.usage.input_tokens}")
print(f" output_tokens: {message1.usage.output_tokens}")
# Output:
# First request usage:
# cache_creation_input_tokens: 1280
# cache_read_input_tokens: 0
# input_tokens: 15
# output_tokens: 53
# Subsequent interaction - the cached system instruction will be reused
message2 = anth_client.messages.create(
model="anthropic/claude-opus-4-20250514",
max_tokens=100,
system=system_instruction, # Must be identical
messages=[{"role": "user", "content": "Which planet is closest to the sun?"}],
)
# Check the usage to confirm cache hit
print("\nSecond request usage:")
print(f" cache_creation_input_tokens: {message2.usage.cache_creation_input_tokens}")
print(f" cache_read_input_tokens: {message2.usage.cache_read_input_tokens}")
print(f" input_tokens: {message2.usage.input_tokens}")
print(f" output_tokens: {message2.usage.output_tokens}")
# Output:
# Second request usage:
# cache_creation_input_tokens: 0
# cache_read_input_tokens: 1280
# input_tokens: 16
# output_tokens: 42
The usage object shows the cache metrics:
cache_creation_input_tokens
: Tokens that were cached (only in first request)cache_read_input_tokens
: Tokens that were read from cache (in subsequent requests)- Regular
input_tokens
andoutput_tokens
count separately
Tool Use (Function Calling)
Tools, also known as function calling, enables you to define external functions that the language model can use to:
- Retrieve real-time data
- Interact with other services
- Execute tasks beyond its own knowledge base
OpenAI Format
cURL Example
curl https://api.withmartian.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MARTIAN_API_KEY" \
-d '{
"model": "openai/gpt-4.1-nano",
"max_tokens": 1024,
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string"
}
}
}
}
}
],
"messages": [
{
"role": "user",
"content": "What'\''s the weather in San Francisco?"
}
]
}'
Python Example
import openai
oai_client = openai.OpenAI(
base_url="https://api.withmartian.com/v1",
api_key=MARTIAN_API_KEY
)
response = oai_client.chat.completions.create(
model="openai/gpt-4.1-nano",
max_tokens=1024,
tools=[
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}
],
messages=[
{
"role": "user",
"content": "What's the weather in San Francisco?"
}
]
)
print(response.choices[0].message.tool_calls)
Anthropic Format
cURL Example
curl https://api.withmartian.com/v1/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MARTIAN_API_KEY" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"max_tokens": 1024,
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string"
}
}
}
}
],
"messages": [
{
"role": "user",
"content": "Use the get_weather tool to check San Francisco weather"
}
]
}'
Python Example
import anthropic
anth_client = anthropic.Anthropic(
base_url="https://api.withmartian.com/v1",
api_key=MARTIAN_API_KEY
)
response = anth_client.messages.create(
model="anthropic/claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
],
messages=[
{
"role": "user",
"content": "What's the weather in San Francisco?"
}
]
)
for item in response.content:
if item.type == "tool_use":
print("Tool name:", item.name)
print("Input args:", item.input)
List Available Models
You can programmatically retrieve the complete list of models supported by Martian using the /v1/models
endpoint. This is useful for:
- Building dynamic model selection interfaces
- Discovering newly added models
- Checking model availability
- Validating model names before making requests
Endpoint
GET https://api.withmartian.com/v1/models
Response Format
The endpoint returns a JSON object with a data
array containing all available models:
{
"object": "list",
"data": [
{
"id": "openai/gpt-4.1-nano",
"object": "model",
"created": 1686935002,
"owned_by": "openai"
},
{
"id": "anthropic/claude-sonnet-4-20250514",
"object": "model",
"created": 1686935002,
"owned_by": "anthropic"
}
// ... more models
]
}
Python Example
import requests
import json
response = requests.get("https://api.withmartian.com/v1/models")
# Parse JSON response
data = json.loads(response.text)
# The actual list of models is under the "data" key
models = data["data"]
# Iterate through models
for model in models:
print(f"Model ID: {model['id']}")
print(f"Provider: {model['owned_by']}")
print("---")
JavaScript Example
const response = await fetch('https://api.withmartian.com/v1/models');
const data = await response.json();
// The actual list of models is under the "data" key
const models = data.data;
// Iterate through models
models.forEach(model => {
console.log(`Model ID: ${model.id}`);
console.log(`Provider: ${model.owned_by}`);
console.log('---');
});
Filtering Models
You can filter the results to find specific models:
import requests
response = requests.get("https://api.withmartian.com/v1/models")
models = response.json()["data"]
# Filter for OpenAI models only
openai_models = [m for m in models if m["owned_by"] == "openai"]
for model in openai_models:
print(model["id"])
cURL Example
curl https://api.withmartian.com/v1/models \
-H "Authorization: Bearer $MARTIAN_API_KEY"
The /v1/models
endpoint returns the complete list of supported models. For detailed pricing and feature information, visit the Available Models page or the Martian Dashboard.