Evaluation Metrics
How we chose the metrics to benchmark providers
We selected the following metrics for our leaderboard
Cost: for input and output tokens, represented as dollars per million tokens
Rate Limit: how many RPS or RPM until a rate limit error is returned
Throughput: in terms of tokens per second, calculated as
throughput = total output tokens / latency
TTFT: time to first token, measured in seconds
Here are a few different use cases where each of these metrics matter:
Cost: While cost is relevant to most users, it becomes especially important for users who want to pass large amounts of data through an LLM for data cleaning or summarization purposes.
Rate Limit: This is important for applications that have peak hours of operations or applications that have many concurrent sessions at once, such as AI powered customer support.
Throughput: the speed of inference is relevant for many use cases, but particularly important for live applications, such as AI powered search or online Q&A
TTFT: time to first token is most important for live chat applications, such as online tutor or character based chats because we want to minimize user wait time before the first word is streamed.
Last updated