Deep Dive: Execution and Advanced Capabilities

Written by Hyperfusion | Nov 21, 2025 11:39:46 PM

Executing the request and tracking outcomes

A single query is ideal for standalone questions. Requests are executed via openai.ChatCompletion.create, where the library handles networking and retries.

The API returns a JSON response that provides key metrics for calculating usage and ROI:

  • Model and response: The response confirms the specific model used. Also, the choices array contains the AI-generated content and the finish_reason.
  • Outcome-based metrics: The usage object tracks crucial token statistics: prompt_tokens (input), completion_tokens (output), and total_tokens (total usage). This metric is the mechanism used for our Outcome-Based Pricing.

 

Migration and outcome-based workloads

The true value of outcome-based pricing is unlocked by frictionless adoption

This section details how we engineered our platform to eliminate the barriers for a seamless migration, ensuring that technical compatibility directly translates into economic certainty.

  • Frictionless migration via OpenAI standard: Our API is designed for developers transitioning from the OpenAI API. We achieve maximum developer velocity because the existing OpenAI Python SDK code remains functional.
    To migrate, simply update your base URL to https://api.hyperfusion.io/v1 and use your Hyperfusion API key.
  • The technical engine of outcome-based pricing: The economic model relies on the ability to track and charge based on the discrete units of work completed.
    The API response explicitly tracks total_tokens used. This verifiable metric is the mechanism used to calculate the fixed price for an AI task, eliminating the cost uncertainty of resource consumption.

Advanced Use Cases

This section details the specialized capabilities required for building sophisticated, high-performance AI agents.

Tool calling

We support tool calling for building applications that integrate external logic.
This capability allows agents to execute external functions, such as fetching real-time data or accessing proprietary APIs. With this, you can enable the creation of sophisticated multi-step agentic workflows.

High-throughput execution

Our API is designed for applications requiring high throughput and efficient scaling:

Asynchronous execution

We support asynchronous execution to run multiple independent queries in parallel. This capability is ideal for applications requiring high throughput, such as batch processing or multi-user services.

Streaming responses

Streaming enables developers to receive partial results in real-time as the model generates them. This is essential for eliminating perceived latency in chatbots and interactive applications, enhancing the end-user experience.