Evaluation Metrics

How we chose the metrics to benchmark providers

We selected the following metrics for our leaderboard

  • Cost: for input and output tokens, represented as dollars per million tokens

  • Rate Limit: how many RPS or RPM until a rate limit error is returned

  • Throughput: in terms of tokens per second, calculated as throughput = total output tokens / latency

  • TTFT: time to first token, measured in seconds

Here are a few different use cases where each of these metrics matter:

  • Cost: While cost is relevant to most users, it becomes especially important for users who want to pass large amounts of data through an LLM for data cleaning or summarization purposes.

  • Rate Limit: This is important for applications that have peak hours of operations or applications that have many concurrent sessions at once, such as AI powered customer support.

  • Throughput: the speed of inference is relevant for many use cases, but particularly important for live applications, such as AI powered search or online Q&A

  • TTFT: time to first token is most important for live chat applications, such as online tutor or character based chats because we want to minimize user wait time before the first word is streamed.

Last updated