State Management and Tool Calling in Rust — Two Households, Both Alike in Dignity

The previous post covered provider abstraction, streaming, and tool calling as isolated capabilities. Each worked in its own right. What we did not show was what happens when they are combined inside a real frontend — when the AI state needs to live somewhere the browser cannot touch, when the UI needs to stay responsive while the model is thinking, and when structured outputs need to feed directly into reactive components.

That is where the architecture gets interesting, and where the mistakes pile up before the right shape emerges.

Act I: The API Route Sprawl #

The initial approach was straightforward: explicit Axum routes for each AI operation. /api/chat for conversation, /api/extract for structured output, /api/tools for the agent with tool access. Each route had a handler, each handler had a corresponding fetch call in the Leptos frontend, each fetch call had its own error mapping.

This was manageable with three routes. By the time we had seven it was a maintenance problem. Adding a new AI capability meant writing the same boilerplate in four places: the route definition, the handler, the client-side fetch, and the error type bridge. A change to the request shape meant touching all four simultaneously and hoping nothing drifted out of sync.

The escape is Leptos Server Functions. Instead of manually defining the network boundary, we annotate a regular Rust function and let the framework generate both sides — the server-side handler and the client-side RPC call — from a single definition.

use leptos::*;

// Compiles into the backend binary only.
// Automatically generates an RPC stub for the Wasm client.
#[server(ContinueConversation, "/api")]
pub async fn continue_conversation(prompt: String) -> Result<String, ServerFnError> {
    let api_key = std::env::var("OPENAI_API_KEY").expect("Key not found");
    let response = call_llm(prompt, api_key).await?;
    Ok(response)
}

The #[server] macro ensures the function body only exists in the server binary. API keys, database handles, internal service calls — none of it is visible to the Wasm client. The client calls continue_conversation as if it were a local async function. The network round-trip is invisible.

The boilerplate drops to one place. Adding a new AI capability is now one annotated function.

Act II: One State to Rule Them All (and the Problems That Followed) #

With server functions handling the network boundary, we turned to state. The first approach was the obvious one: put everything in a single RwSignal. Message history, loading state, current input, the AI’s internal context — all of it in one place, shared across components.

It worked for a while. Then the conversation history grew. We were serialising fifty kilobytes of message history into every server function call, because the client held all of it and the server needed all of it. Response latency crept up. Then a user opened two tabs and the history forked. Then we realised the full conversation context was sitting in the browser’s memory, readable to any extension or injected script.

The fix we reached for — moving everything to the server — created the opposite problem. The UI had no local state at all, so every keystroke triggered a round-trip before the interface could respond. Loading indicators flickered. Optimistic updates were impossible.

The right answer is a strict separation: AI state lives on the server, UI state lives on the client, and server functions are the narrow channel between them.

AI state — the full conversation history, internal metadata, anything the model needs — lives in the Axum backend, stored in an Arc<RwLock<Vec<Message>>> or an external store. It never travels to the browser wholesale.

UI state — loading spinners, the current input value, the rendered chat bubbles — lives in Leptos RwSignal primitives, scoped to the component tree that needs them.

#[derive(Clone)]
pub struct ChatUIState {
    pub messages: RwSignal<Vec<String>>,
    pub is_loading: RwSignal<bool>,
}

#[component]
pub fn ChatInterface() -> impl IntoView {
    let ui_state = expect_context::<ChatUIState>();

    let submit_prompt = create_action(move |input: &String| {
        let input_clone = input.clone();
        async move {
            ui_state.is_loading.set(true);

            // The server function manages AI state.
            // We only receive the new response, not the full history.
            if let Ok(response) = continue_conversation(input_clone).await {
                ui_state.messages.update(|msgs| msgs.push(response));
            }

            ui_state.is_loading.set(false);
        }
    });
}

Each call returns only the new response. The client appends it to the local message list. The model’s full context stays on the server. The two households — AI state and UI state — remain separate, and the application is more reliable for it.

Act III: Structured Output Inside a Server Function #

In the previous post we covered structured extraction: deriving JsonSchema, building an extractor, deserialising the result into a typed struct. What we did not show was how that structured output flows into a reactive UI component.

The naive approach is to return a JSON string from the server function and parse it on the client. This reintroduces the fragility we were trying to escape — the client has to assume a shape that the server knows. If the struct changes, the client breaks at runtime rather than compile time.

The cleaner approach is to let serde carry the struct across the boundary. Leptos server functions serialise and deserialise return types automatically, so a Vec<ProductRecommendation> returned from a server function arrives on the client as a Vec<ProductRecommendation> — not a string.

use serde::{Deserialize, Serialize};
use schemars::JsonSchema;
use rig::providers::openai;

#[derive(Serialize, Deserialize, JsonSchema, Debug, Clone)]
struct ProductRecommendation {
    name: String,
    price: f64,
    category: String,
}

#[server(GenerateProducts, "/api")]
pub async fn generate_products(
    query: String,
) -> Result<Vec<ProductRecommendation>, ServerFnError> {
    let client = openai::Client::from_env();
    let extractor = client.extractor::<Vec<ProductRecommendation>>("gpt-4o").build();

    let products = extractor
        .extract(&format!("Recommend 3 products for: {}", query))
        .await
        .map_err(|_| ServerFnError::ServerError("Extraction failed".into()))?;

    Ok(products)
}

The Leptos component receives a Vec<ProductRecommendation> and renders it directly. If the struct changes — a field is renamed, a type is tightened — the compiler catches the mismatch before anything ships. The AI’s output is validated on the server; the UI works with clean typed data.

Act IV: Tool Calling Inside a Server Function #

Tool calling in the previous post was demonstrated as a standalone agent. The remaining question was how it fits inside the server function model — and whether the security boundary holds when the model is deciding which functions to invoke.

The answer is that server functions are exactly the right home for tool-equipped agents. The agent and its tools are defined entirely on the server. The Wasm client calls the server function and gets back a string. It never sees the tool definitions, the intermediate tool call requests, or the raw results. The model decides when to call a tool; Rust decides what that call actually does.

use rig_derive::tool;
use rig::providers::openai;

#[tool(description = "Get the current weather for a specific city")]
fn get_weather(city: String) -> String {
    format!("The weather in {} is currently 22°C and sunny.", city)
}

#[server(ChatWithTools, "/api")]
pub async fn ask_with_tools(prompt: String) -> Result<String, ServerFnError> {
    let client = openai::Client::from_env();

    let agent = client
        .agent("gpt-4o")
        .preamble("You are a helpful assistant. Use tools when necessary.")
        .tool(get_weather)
        .build();

    let response = agent
        .prompt(&prompt)
        .await
        .map_err(|e| ServerFnError::ServerError(e.to_string()))?;

    Ok(response)
}

The execution loop — detect tool call, run the function, feed the result back, resume generation — is entirely contained within the server function body. From the Leptos component’s perspective, it called a function and got a string back. The multi-step agent loop is an implementation detail it never sees.

The Shape of the System #

The four pieces compose into a clear layered architecture:

Server functions replace the API route sprawl with a single, compiler-checked boundary
Dual-state management keeps AI context on the server and UI responsiveness on the client
Structured extraction flows across the boundary as typed Rust structs, not strings
Tool-equipped agents run entirely on the server, the client receiving only the final answer

The Romeo and Juliet prologue describes two households, separate and dignified, whose entanglement causes the drama. The architecture here keeps the two households — client state and server state — deliberately apart, and the result is the absence of drama: no context leaking to the browser, no serialisation mismatches at runtime, no API routes drifting out of sync with their callers.

Act I: The API Route Sprawl #

Act II: One State to Rule Them All (and the Problems That Followed) #

Act III: Structured Output Inside a Server Function #

Act IV: Tool Calling Inside a Server Function #

The Shape of the System #

References #