Advanced Features

Martian offers a range of features designed to improve efficiency, flexibility, and security when working with AI models.

Cost-Optimized Routing

Martian helps you reduce model usage costs through efficient routing strategies, without sacrificing accuracy or quality.

To access these reduced rates, append :cheap to any model name for intelligent cost optimization. For example:

openai/gpt-4.1-nano:cheap: Maintains GPT-4.1 nano quality at reduced cost
anthropic/claude-sonnet-4-20250514:cheap: Cost-effective Claude 3.5 Sonnet alternative

For detailed pricing information, see our Available Models page.

import openai

client = openai.OpenAI(
    base_url="https://api.withmartian.com/v1",
    api_key=MARTIAN_API_KEY,
)

completion = client.chat.completions.create(
    model="openai/gpt-5:cheap",
    messages=[
        {
            "role": "user",
            "content": "What are some ways to save money on a trip to Mars?"
        }
    ]
)

Streaming Responses

To enable real-time response streaming, set stream=True when you create the completions request.

OpenAI Python Example

import openai

oai_client = openai.OpenAI(
    base_url="https://api.withmartian.com/v1",
    api_key=MARTIAN_API_KEY
)

stream = oai_client.chat.completions.create(
    model="openai/gpt-4.1-mini",
    messages=[{"role": "user", "content": "What is Olympus Mons?"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Anthropic Python Example

import anthropic

anth_client = anthropic.Anthropic(
    base_url="https://api.withmartian.com/v1",
    api_key=MARTIAN_API_KEY
)

stream = anth_client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a sonnet to Mars (the god, not the planet)."}],
    stream=True
)

for chunk in stream:
    if chunk.type == "content_block_delta" and chunk.delta.type == "text_delta":
        print(chunk.delta.text, end="")

Martian supports Anthropic's message caching for POST /v1/messages. This enables you to cache input tokens on the first call to the endpoint, which can then be reused in subsequent calls to reduce response times and inference costs.

To use caching, set a cache breakpoint with "cache_control": { "type": "ephemeral" } after the content of a prompt block. In subsequent requests, the entire prompt up to cache_control must be identical to the previously cached prompt.

Important: Anthropic will not cache fewer than 1024 tokens (or fewer than 2048 tokens for Haiku models).

See Anthropic's Prompt caching documentation for additional information on how caching works.

cURL Example

First Request - Sets up the cache:

curl https://api.withmartian.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MARTIAN_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4-20250514",
    "max_tokens": 100,
    "system": [
      {
        "type": "text",
        "text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on facts, specifications, and other information.\n"
      },
      {
        "type": "text",
        "text": "<the entire contents of Space! : The Universe As You have Never Seen It Before>",
        "cache_control": { "type": "ephemeral" }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "What is the largest planet?"
      }
    ]
  }'

Response from First Request:

The response includes usage metadata showing cache creation:

{
  "id": "msg_XXXxXxXXxXxxXX",
  "type": "message",
  "role": "assistant",
  "model": "anthropic/claude-opus-4-20250514",
  "content": [
    {
      "type": "text",
      "text": "According to the information provided, Jupiter is the largest planet in our solar system..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 27,
    "cache_creation_input_tokens": 1279,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 1279,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 45,
    "service_tier": "standard"
  }
}

Notice cache_creation_input_tokens: 1279 indicates that 1,279 tokens were cached successfully.

Second Request - Uses cached content:

# The system prompt must be identical to use the cache
curl https://api.withmartian.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MARTIAN_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4-20250514",
    "max_tokens": 100,
    "system": [
      {
        "type": "text",
        "text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on facts, specifications, and other information.\n"
      },
      {
        "type": "text",
        "text": "<the entire contents of Space! : The Universe As You have Never Seen It Before>",
        "cache_control": { "type": "ephemeral" }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "Which planet is closest to the sun?"
      }
    ]
  }'

Response from Second Request:

The response shows cache reuse with cache_read_input_tokens:

{
  "id": "msg_YYYyYyYYyYyyYY",
  "type": "message",
  "role": "assistant",
  "model": "anthropic/claude-opus-4-20250514",
  "content": [
    {
      "type": "text",
      "text": "Mercury is the planet closest to the Sun, orbiting at an average distance of..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 21,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 1279,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 42,
    "service_tier": "standard"
  }
}

Notice cache_read_input_tokens: 1279 indicates that the cached tokens were successfully reused, saving both time and cost.

Python Example

import anthropic

anth_client = anthropic.Anthropic(
    base_url="https://api.withmartian.com/v1",
    api_key=MARTIAN_API_KEY
)

# Define a common system instruction that will be used repeatedly
system_instruction = [
    {
        "type": "text",
        "text": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on facts, specifications, and other information.\n",
    },
    {
        "type": "text",
        "text": "<the entire contents of 'Space! : The Universe As You've Never Seen It Before'>",
        "cache_control": {"type": "ephemeral"}
    }
]

# First interaction - the system instruction will be cached
message1 = anth_client.messages.create(
    model="anthropic/claude-opus-4-20250514",
    max_tokens=100,
    system=system_instruction,
    messages=[{"role": "user", "content": "What is the largest planet?"}],
)

# Check the usage to confirm cache creation
print("First request usage:")
print(f"  cache_creation_input_tokens: {message1.usage.cache_creation_input_tokens}")
print(f"  cache_read_input_tokens: {message1.usage.cache_read_input_tokens}")
print(f"  input_tokens: {message1.usage.input_tokens}")
print(f"  output_tokens: {message1.usage.output_tokens}")

# Output:
# First request usage:
#   cache_creation_input_tokens: 1280
#   cache_read_input_tokens: 0
#   input_tokens: 15
#   output_tokens: 53

# Subsequent interaction - the cached system instruction will be reused
message2 = anth_client.messages.create(
    model="anthropic/claude-opus-4-20250514",
    max_tokens=100,
    system=system_instruction,  # Must be identical
    messages=[{"role": "user", "content": "Which planet is closest to the sun?"}],
)

# Check the usage to confirm cache hit
print("\nSecond request usage:")
print(f"  cache_creation_input_tokens: {message2.usage.cache_creation_input_tokens}")
print(f"  cache_read_input_tokens: {message2.usage.cache_read_input_tokens}")
print(f"  input_tokens: {message2.usage.input_tokens}")
print(f"  output_tokens: {message2.usage.output_tokens}")

# Output:
# Second request usage:
#   cache_creation_input_tokens: 0
#   cache_read_input_tokens: 1280
#   input_tokens: 16
#   output_tokens: 42

The usage object shows the cache metrics:

cache_creation_input_tokens: Tokens that were cached (only in first request)
cache_read_input_tokens: Tokens that were read from cache (in subsequent requests)
Regular input_tokens and output_tokens count separately

Tool Use (Function Calling)

Tools, also known as function calling, enables you to define external functions that the language model can use to:

Retrieve real-time data
Interact with other services
Execute tasks beyond its own knowledge base

OpenAI Format

cURL Example

curl https://api.withmartian.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MARTIAN_API_KEY" \
  -d '{
    "model": "openai/gpt-4.1-nano",
    "max_tokens": 1024,
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string"
              }
            }
          }
        }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "What'\''s the weather in San Francisco?"
      }
    ]
  }'

Python Example

import openai

oai_client = openai.OpenAI(
    base_url="https://api.withmartian.com/v1",
    api_key=MARTIAN_API_KEY
)

response = oai_client.chat.completions.create(
    model="openai/gpt-4.1-nano",
    max_tokens=1024,
    tools=[
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"}
                    }
                }
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "What's the weather in San Francisco?"
        }
    ]
)

print(response.choices[0].message.tool_calls)

Anthropic Format

cURL Example

curl https://api.withmartian.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $MARTIAN_API_KEY" \
  -d '{
    "model": "anthropic/claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "tools": [
      {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string"
            }
          }
        }
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "Use the get_weather tool to check San Francisco weather"
      }
    ]
  }'

Python Example

import anthropic

anth_client = anthropic.Anthropic(
    base_url="https://api.withmartian.com/v1",
    api_key=MARTIAN_API_KEY
)

response = anth_client.messages.create(
    model="anthropic/claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "input_schema": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "What's the weather in San Francisco?"
        }
    ]
)

for item in response.content:
    if item.type == "tool_use":
        print("Tool name:", item.name)
        print("Input args:", item.input)

List Available Models

You can programmatically retrieve the complete list of models supported by Martian using the /v1/models endpoint. This is useful for:

Building dynamic model selection interfaces
Discovering newly added models
Checking model availability
Validating model names before making requests

Endpoint

GET https://api.withmartian.com/v1/models

Response Format

The endpoint returns a JSON object with a data array containing all available models:

{
  "object": "list",
  "data": [
    {
      "id": "openai/gpt-4.1-nano",
      "object": "model",
      "created": 1686935002,
      "owned_by": "openai"
    },
    {
      "id": "anthropic/claude-sonnet-4-20250514",
      "object": "model",
      "created": 1686935002,
      "owned_by": "anthropic"
    }
    // ... more models
  ]
}

Python Example

import requests
import json

response = requests.get("https://api.withmartian.com/v1/models")

# Parse JSON response
data = json.loads(response.text)

# The actual list of models is under the "data" key
models = data["data"]

# Iterate through models
for model in models:
    print(f"Model ID: {model['id']}")
    print(f"Provider: {model['owned_by']}")
    print("---")

JavaScript Example

const response = await fetch('https://api.withmartian.com/v1/models');
const data = await response.json();

// The actual list of models is under the "data" key
const models = data.data;

// Iterate through models
models.forEach(model => {
  console.log(`Model ID: ${model.id}`);
  console.log(`Provider: ${model.owned_by}`);
  console.log('---');
});

Filtering Models

You can filter the results to find specific models:

import requests

response = requests.get("https://api.withmartian.com/v1/models")
models = response.json()["data"]

# Filter for OpenAI models only
openai_models = [m for m in models if m["owned_by"] == "openai"]

for model in openai_models:
    print(model["id"])

cURL Example

curl https://api.withmartian.com/v1/models \
  -H "Authorization: Bearer $MARTIAN_API_KEY"

The /v1/models endpoint returns the complete list of supported models. For detailed pricing and feature information, visit the Available Models page or the Martian Dashboard.

Advanced Features

Cost-Optimized Routing

Streaming Responses

OpenAI Python Example

Anthropic Python Example

Caching

cURL Example

Python Example

Tool Use (Function Calling)

OpenAI Format

cURL Example

Python Example

Anthropic Format

cURL Example

Python Example

List Available Models

Endpoint

Response Format

Python Example

JavaScript Example

Filtering Models

cURL Example