Request Structure and Outcome Metrics

Written by Hyperfusion | Nov 21, 2025 11:47:55 PM

This page is a quick-start guide detailing the fundamental request structure needed to start using the API immediately.

Request payload construction and role management

The core of any request is the payload, which defines the model and the conversational roles to guide the AI.

Model specification

The payload must include the model (specifying the LLM, e.g., qwen/qwen3-32b).

Role-based messaging

Every message in the request must include a distinct role (system, user, or assistant) and corresponding content.

System role: Used to set the model's initial behavior, tone, or persona. This acts as a fixed instruction for all subsequent interactions.
User role: Represents the user’s input or question. This role drives the conversation forward.
Assistant role: Represent the model’s generated response. Including this role in the messages array enables long-running conversations and maintains context across multiple steps.

Sending queries

The following section details the primary methods for issuing API requests.

Running single queries

Single queries are ideal for standalone questions or commands that do not require conversation history. The user role is employed to send a direct request to the model.

Implementation strategy: Supported implementations include OpenAI Py SDK, direct Python, and cURL.

Maintaining context (Long-running conversations)

To preserve context across multi-step interactions, the assistant role is employed. This allows the model to reference its previous responses, enabling a coherent flow for long-running conversations.

Implementation strategy: Supported implementations include OpenAI, direct Python, and cURL.

Model behavior customization (System role)

The system role allows us to define the model’s behavior, tone, or persona. This is essential for tailoring responses to specific use cases (e.g., a friendly travel advisor, a strict code reviewer, or a creative storyteller).

Implementation strategy: Supported implementations include OpenAI, direct Python, and cURL.

Real-time output (Streaming)

We enable streaming to receive partial results in real-time as the model generates them. This approach is ideal for long responses, interactive applications, or when incremental progress display is required.

Implementation strategy: Supported implementations include OpenAI, direct Python, and cURL.

High-throughput execution (Async support)

Asynchronous programming is used to execute multiple independent queries in parallel. This is ideal for applications requiring high throughput, such as batch processing, chatbots, or multi-user services.

Implementation Strategy: Supported implementations include OpenAI, direct Python, and cURL.

Summary of Hyperfusion’s Capabilities

Capability	Core Technical Benefit
Single queries	Ideal for standalone Q&A using the user role.
Maintaining context	Preserves conversation history for multi-step interactions (assistant role).
Model behavior customization	Defines model persona/tone for specific use cases (system role).
Real-time output (Streaming)	Eliminates perceived latency by delivering partial results instantly.
High-throughput execution (Async support)	Executes multiple independent queries in parallel for high-scale applications.

View full post