This post explains how outcome-based pricing works at Hyperfusion, why we believe usage-based pricing is fundamentally misaligned with how teams actually build AI products, and what the real-world economics look like.
Per-token pricing is simple to understand, which is why it became the industry default. But simplicity of the billing model does not mean simplicity for the team paying the bill.
Consider a production chatbot handling 50,000 conversations per day. With per-token billing, your cost is a function of conversation length, which you do not fully control. Users ask longer questions during certain hours. Your system prompts grow as you add features. A prompt engineering improvement that makes responses more detailed also makes them more expensive. The cost is a moving target.
For a team in Bucharest or Dubai building AI products for clients, this creates a practical problem: you cannot give your client a fixed price for the AI component of their product because you do not know what your own costs will be. You end up either padding your margins heavily (making you uncompetitive) or absorbing the risk yourself (making you unprofitable on busy months).
The principle is straightforward. You describe the job you need done. We give you a fixed price. If the job takes more compute than expected, we absorb the overrun. If it takes less, we keep the difference. Both sides have clear incentives: you get budget certainty, and we are incentivised to run infrastructure as efficiently as possible.
In practical terms, this means:
The pricing calculator at hyperfusion.io/pricing lets you model these scenarios interactively. You define your workload parameters and see the fixed price before committing to anything.
Suppose your team runs a customer support chatbot built on Llama 3 8B. You handle approximately 30,000 conversations per day, with an average of 8 turns per conversation, and you need sub-200ms time-to-first-token latency.
On a per-token provider, your monthly cost might range from $4,000 to $9,000 depending on conversation lengths, seasonal traffic patterns, and whether you need to scale up for peak hours. You will not know the actual number until the invoice arrives.
On Hyperfusion, you describe this workload to the sizing wizard, which recommends the appropriate H100 configuration and gives you a fixed monthly price. That price covers the full throughput range, including peak hours. If your traffic doubles for a week, the autoscaler handles it and the price does not change.
Outcome-based pricing works well for workloads with reasonably predictable patterns. If your usage is highly unpredictable (you genuinely have no idea whether you will process 1,000 or 10 million requests next month), a usage-based model might initially make more sense because it scales down as well as up.
That said, most production workloads are more predictable than teams assume. If you have been running for more than a month, your traffic patterns give you enough signal to define a workload profile for fixed pricing. And the financial certainty of knowing your costs in advance is usually worth more than the theoretical flexibility of usage-based billing.
The pricing calculator does not require an account. Go to hyperfusion.io/pricing, describe your workload, and see what the fixed price would be. Compare it against your current provider's bill. The numbers tend to speak for themselves.