Building AI Workflows in Rust — Infinite in Faculty

There is a trap that catches most AI integrations in their first week of life. You wire up a call to an LLM, the demo works beautifully, and you ship it. Then the feature requests arrive. Users want real-time feedback, not a ten-second spinner. The product team wants the AI to return data your backend can actually process, not a paragraph of prose it has to parse. Someone asks why you are locked into one provider. And eventually, the most exciting request: can the AI look something up?

Each of these is a reasonable ask. Together they represent the jump from “AI prototype” to “AI application.” Here is how we made that jump, entirely in Rust.

Act I: The Provider Lock-in Problem #

The first version of the integration was straightforward — instantiate an OpenAI client, send a prompt, return the string. It worked. It also meant that every piece of business logic was implicitly coupled to one provider’s API shape, authentication convention, and error types.

The fragility showed up quietly at first. A rate limit here, a model deprecation notice there. The real problem was structural: we had no way to route requests to a fallback provider, no way to A/B test models, no way to respond to pricing changes without touching core application code.

We tried the obvious fix first: wrapping the OpenAI client in our own struct and hoping we would never need to change it. That held for about three months. Then a new model came out on a competing platform at half the cost, and suddenly every layer of the application needed surgery. We had not abstracted the provider — we had just hidden it one level deeper.

The solution is to treat the AI provider the way you would treat a database: abstract it behind an interface, and let the rest of the application talk to the interface. In the Rust ecosystem, rig-rs does exactly this — it standardizes how our application communicates across providers, applying what is effectively the abstract factory pattern.

use rig::providers::{openai, gemini};
use rig::completion::Prompt;
use std::env;

pub async fn generate_greeting(provider_choice: &str) -> Result<String, Box<dyn std::error::Error>> {
    let response = match provider_choice {
        "openai" => {
            let client = openai::Client::from_env();
            let agent = client.agent("gpt-4o").build();
            agent.prompt("Say a quick hello!").await?
        },
        "gemini" => {
            let client = gemini::Client::from_env();
            let agent = client.agent("gemini-1.5-flash").build();
            agent.prompt("Say a quick hello!").await?
        },
        _ => return Err("Unsupported provider".into()),
    };

    Ok(response)
}

The match arm is the only place in the codebase that knows which provider is active. Everything downstream — streaming, structured extraction, tool calling — speaks the same rig vocabulary regardless. Swapping providers becomes a configuration change, not a refactor.

Act II: The Blank Screen Problem #

With the abstraction in place, the next complaint came from users. The AI feature felt slow, they said, even when the total response time was acceptable. The issue was perception: a ten-second wait with no feedback feels broken. A ten-second wait where text appears word by word feels fast.

The first instinct was to throw a loading spinner at it. That made it worse. A spinner tells users “something is happening” but gives no signal of how much is happening, or whether the model has stalled. Support tickets arrived asking if the feature was broken. It was not broken — it was just silent.

This is not a new problem. It is the same reason search engines show partial results and why terminals print output incrementally. Streaming does not make the AI faster — it makes the wait legible.

LLMs generate tokens sequentially. Streaming delivers those tokens to the client as they are produced rather than buffering the full response. On the Rust backend (using Axum), we use Server-Sent Events: a long-lived HTTP connection over which we push each chunk the moment the model yields it.

use axum::{
    response::sse::{Event, Sse},
    routing::get,
    Router,
};
use futures::stream::{Stream, StreamExt};
use rig::providers::openai;
use std::convert::Infallible;

async fn stream_ai_response() -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
    let client = openai::Client::from_env();
    let agent = client.agent("gpt-4o").build();

    let mut stream = agent.stream_prompt("Write a short story about Rust.").await.unwrap();

    let sse_stream = async_stream::stream! {
        while let Some(chunk) = stream.next().await {
            if let Ok(text) = chunk {
                yield Ok(Event::default().data(text));
            }
        }
    };

    Sse::new(sse_stream)
}

pub fn app() -> Router {
    Router::new().route("/api/chat/stream", get(stream_ai_response))
}

No buffering, no waiting. The rig agent returns an async stream; we map that stream directly into Axum’s SSE response type. The client starts seeing tokens in milliseconds. The model may still take ten seconds to finish, but the interaction feels alive from the first word.

Act III: The Parsing Nightmare #

Text generation solved the user-facing problem. It created a new one on the backend.

The next feature required the AI to return structured data — a list of products, each with a name, price, and category — so a downstream pipeline could process it. The first attempt used prompt engineering: “respond only in JSON, with the following fields.” It worked maybe eighty percent of the time. The other twenty percent the model added a preamble, used a slightly different field name, or wrapped the JSON in a markdown code block.

We tried adding more instructions to the prompt. “Return only valid JSON.” “Do not include any explanation.” “Wrap your answer in a code block.” Each instruction helped with one failure mode and introduced another. The model is not a JSON serializer and cannot be reliably coached into acting like one. We were solving the wrong problem.

Writing a robust parser for LLM free-form output is the kind of work that sounds easy and takes weeks. The solution is to stop asking the model to be well-behaved and start giving it no other option.

Rust’s type system, serde, and schemars turn this into a solved problem. Define a struct, derive the right traits, and rig’s extractor sends the schema to the model as a hard constraint. The model must produce JSON that matches it.

use rig::providers::openai;
use serde::Deserialize;
use schemars::JsonSchema;

#[derive(Deserialize, JsonSchema, Debug)]
struct ProductList {
    products: Vec<Product>,
}

#[derive(Deserialize, JsonSchema, Debug)]
struct Product {
    name: String,
    description: String,
    price: f64,
    category: String,
}

pub async fn generate_product_catalog() {
    let client = openai::Client::from_env();
    let extractor = client.extractor::<ProductList>("gpt-4o").build();

    let result: ProductList = extractor
        .extract("Generate a list of 3 breakfast cereals.")
        .await
        .expect("Failed to generate or parse JSON");

    for product in result.products {
        println!("{}: ${} - {}", product.name, product.price, product.category);
    }
}

#[derive(JsonSchema)] generates the JSON Schema that rig sends to the model as a constraint. If the model hallucinates a field that does not exist, or returns a string where f64 is expected, serde catches it at deserialization. The type error surfaces as a handled Result, not a runtime panic buried in production logs at 2am.

Act IV: The AI That Could Not Look Anything Up #

The last problem was the most interesting. The application was working well for everything the model already knew. Then came the requests: can it tell me the current weather? Can it check our internal pricing database? Why does it not know what happened last week?

The temptation was to stuff the context window with raw data and hope the model could reason over it. For small datasets that worked. For anything live — a database query that changes by the minute, an API that requires authentication — it collapsed immediately. Context windows are not databases. They are expensive, slow, and stale the moment you fill them.

The model’s training cutoff is a hard wall. For real-time data you need the model to reach out and touch the world. This is tool calling: the model identifies what it needs, halts generation, requests a function execution, and incorporates the result into its final answer. It is also where the title earns itself. A model that can call arbitrary functions becomes, in Hamlet’s phrase, infinite in faculty — its effective knowledge bounded only by the tools you register.

The security model matters here. The model decides when to call a tool; your Rust code controls what actually runs. The model never executes anything directly.

use rig::providers::openai;
use rig::tool::Tool;
use rig_derive::tool;

#[tool(description = "Get the current weather for a specific city")]
fn get_weather(city: String) -> String {
    // In production this makes an HTTP call to a weather API
    format!("The weather in {} is currently 22°C and sunny.", city)
}

pub async fn run_weather_agent() {
    let client = openai::Client::from_env();

    let agent = client.agent("gpt-4o")
        .preamble("You are a helpful weather assistant.")
        .tool(get_weather)
        .build();

    let response = agent.prompt("What is the weather like in Paris today?").await.unwrap();

    println!("Agent: {}", response);
}

The #[tool] macro generates the schema that rig exposes to the model. The model sees the tool’s name and description — never the Rust source. The HTTP call, the database query, the authorization check — all of that happens inside your function, under your control. The framework handles the execution loop: detect tool call request, run the function, feed the result back to the model, resume generation.

The Shape of the Full System #

These four capabilities compose. A production AI feature in this application now:

Selects a provider based on user tier and latency requirements (abstraction)
Streams tokens to the client as they are generated (streaming)
Forces a typed intermediate representation for pipeline stages that need structured data (extraction)
Allows the model to call internal APIs for real-time context (tool calling)

rig-rs threads through all of it with a consistent agent API. What makes Rust a good fit for this is the same thing that makes it uncomfortable for rapid prototyping: the type system is strict. Structured outputs are verified at the type level. Tool schemas are generated from function signatures. Streaming errors are typed rather than string exceptions. The compiler catches entire categories of AI integration bugs before they reach users.

Act I: The Provider Lock-in Problem #

Act II: The Blank Screen Problem #

Act III: The Parsing Nightmare #

Act IV: The AI That Could Not Look Anything Up #

The Shape of the Full System #

References #