Cost Controls

The model router exposes 4 parameters than can be used to control costs.

  • max_cost - The maximum cost of the total request in USD. Allows you to specify an upper bound on what you are willing to pay for the request.

  • max_cost_per_million_tokens - The maximum cost of each 1 million tokens in the request, in USD. Allows you to specify a minimum efficiency for your models.

  • model - The set of models from which we should be able to route. Allows you to select only models within a certain cost range.

  • willingness_to_pay - A parameter specifying the values of getting better output, measured in dollars. A value of 0.1, for example, indicates that each 10% improvement in performance is 10 cents. If this parameter is not set, it defaults to infinity, which indicates that we should optimize only for performance.

Each of these controls can be useful in different scenarios, and multiple methods of controlling cost and be used simultaneously.

Last updated