Security, Anonymization, and Deployment in Rust — Once More unto the Breach

The previous post hardened the application against internal failures — memory leaks, silent errors, flaky tests. This post is about external failures: what happens when real users, and real attackers, find the endpoint.

Going from local development to the public internet is Henry V’s breach — a gap in the wall that everything can pour through if you are not ready. We were not ready. The first production incident arrived within 48 hours.

Act I: The 50,000-Character Payload #

The Leptos input field had a maxlength="1000" attribute. That is a UI constraint. It took one curl command to bypass it entirely.

Someone — probably not malicious, possibly just testing — sent a 50,000-character payload directly to the /api/chat endpoint. It passed validation (we had none), reached the rig-rs layer, got forwarded to the AI provider, and timed out after thirty seconds with the SSE stream still open. The server held the connection. The client got nothing. The logs showed an API timeout with no indication of what had caused it.

Client-side validation is a user experience feature, not a security feature. Any constraint that exists only in the browser can be removed by anyone with a terminal. Server-side validation must be the single source of truth.

The validator crate applied to the Axum handler gives us enforceable constraints before anything reaches the AI layer:

use axum::{Json, http::StatusCode};
use serde::Deserialize;
use validator::Validate;

#[derive(Deserialize, Validate)]
pub struct ChatRequest {
    #[validate(length(
        min = 1,
        max = 1000,
        message = "Prompt must be between 1 and 1000 characters"
    ))]
    pub prompt: String,
}

pub async fn handle_chat(
    Json(payload): Json<ChatRequest>,
) -> Result<String, (StatusCode, String)> {
    if let Err(e) = payload.validate() {
        return Err((StatusCode::BAD_REQUEST, format!("Invalid input: {}", e)));
    }

    // Safe to process
    Ok("Processing...".to_string())
}

The validation failure returns a 400 before the handler body executes. The AI provider never sees the payload. The open connection never happens. A rule that lives in the struct definition cannot be accidentally omitted from a new handler the way a manual if check can.

Act II: The $40 Stress Test #

Three days after launch, a single IP address sent twelve thousand requests in an hour. It was not a sophisticated attack — a simple loop hammering the endpoint. Each request was well within the payload limit, so validation passed on all of them. Each one forwarded a prompt to the AI provider, consumed tokens, and returned a response.

The API bill for that hour was forty dollars. The attacker spent nothing.

An unauthenticated, unthrottled endpoint connected to a token-billed API is a financial liability regardless of payload size. The middleware layer needs to enforce rate limits before requests reach the handler.

tower_governor integrates with Axum’s tower middleware stack to throttle by IP address. The critical production detail is the backing store: without Redis, each container instance maintains its own independent token bucket in memory. A user who routes requests across two container instances effectively gets double the allowance. Redis makes the bucket shared and consistent across the entire fleet.

use axum::{routing::post, Router};
use tower_governor::{governor::GovernorConfigBuilder, GovernorLayer};
use tower_governor::key_extractor::SmartIpKeyExtractor;
use tower_governor_redis::RedisStore;

pub async fn app() -> Router {
    // Shared Redis store — token buckets survive across container restarts
    // and are consistent across horizontal replicas
    let store = RedisStore::new("redis://127.0.0.1/").await.unwrap();

    let governor_conf = GovernorConfigBuilder::default()
        .per_second(10)       // replenish one token every 10 seconds
        .burst_size(5)        // allow bursts up to 5 requests
        .key_extractor(SmartIpKeyExtractor)
        .store(store)
        .finish()
        .unwrap();

    Router::new()
        .route("/api/chat", post(handle_chat))
        .layer(GovernorLayer {
            config: Box::leak(governor_conf.into()),
        })
}

Requests that exceed the burst limit receive a 429 before touching the handler. For per-user daily quotas — a free tier capped at one hundred requests per day — the same Redis connection serves as a counter store via redis-rs, keyed by user ID rather than IP. Both the IP throttle and the user quota share the Redis instance, and both survive a container restart or redeployment.

The twelve-thousand-request session would have consumed five requests in the first fifty seconds and been throttled for the remaining fifty-nine minutes. Cost: under a cent.

Act III: The Email Address in the Prompt #

A week in, a user pasted a message into the chat that contained their full name, phone number, email address, and a description of a medical issue they were asking about. The prompt went to the AI provider verbatim. We had no idea until we reviewed logs looking for something unrelated.

The immediate question was uncomfortable: how many users had done this, and what had they sent? The longer-term question was legal: GDPR and CCPA treat personally identifiable information as regulated data. Sending it to a third-party AI provider without explicit disclosure and a data processing agreement is a compliance exposure.

The fix is a redaction step between the validation layer and the AI provider. No regex-based PII detection is exhaustive, but catching the common patterns — emails, phone numbers, national ID formats — before they leave your server is a meaningful reduction in exposure. The external provider only ever sees the scrubbed text:

use regex::Regex;

fn anonymize_text(input: &str) -> String {
    let email_re =
        Regex::new(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b").unwrap();
    let phone_re = Regex::new(r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b").unwrap();

    let no_emails = email_re.replace_all(input, "[EMAIL_REDACTED]");
    let safe_text = phone_re.replace_all(&no_emails, "[PHONE_REDACTED]");

    safe_text.to_string()
}

pub async fn secure_llm_call(raw_prompt: &str) {
    let safe_prompt = anonymize_text(raw_prompt);
    // safe_prompt is what rig-rs forwards to the provider
    println!("Sending to LLM: {}", safe_prompt);
}

This runs before the rig-rs call in the handler, after validation. The user’s original text is never logged; only the redacted version proceeds. For domains with heavier PII requirements — healthcare, finance — the regex set expands accordingly, and the redaction logic can be pulled into a middleware layer so it applies to every handler without needing to be called explicitly.

Act IV: Works on My Machine #

The deployment failures were quieter but persistent. The Leptos Wasm bundle compiled to a different artifact on CI than on development machines because the wasm-pack version differed. Environment variables containing API keys were committed to version control twice in one month — once in a .env file, once embedded in a config file that was supposed to be gitignored. Scaling was manual: when traffic spiked, someone SSH’d in and restarted the process.

Containerisation with Docker solves the build reproducibility problem by definition: the same Dockerfile produces the same artifact everywhere. A multi-stage build keeps the final image small — the compilation stage has the full Rust toolchain; the runtime stage has only the binary and the Wasm bundle:

# Stage 1: Build
FROM rust:1.77 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release
RUN wasm-pack build --target web

# Stage 2: Runtime
FROM debian:bookworm-slim
WORKDIR /app
COPY --from=builder /app/target/release/server ./server
COPY --from=builder /app/pkg ./pkg
EXPOSE 3000
CMD ["./server"]

Secrets are injected at runtime by the deployment platform — Kubernetes secrets, Fly.io environment variables, Docker Compose .env files that are never committed — not baked into the image. The application reads OPENAI_API_KEY from the environment at startup and fails fast with a clear error if it is absent. There is nothing to accidentally commit.

Horizontal scaling follows naturally: the Axum server is stateless (conversation history lives in Redis, not in process memory), so running multiple container instances behind a load balancer requires no application changes. The rate limiting buckets live in the same Redis instance the conversation state uses — a single shared store for both. The CDN serves the Wasm bundle from edge nodes; only the API traffic reaches the containers.

The Production Checklist #

The four layers compose into a sequence every incoming request passes through:

Validation — validator crate enforces payload constraints before the handler body executes
Rate limiting — tower_governor with a Redis backing store throttles consistently across all container instances; redis-rs enforces per-user quotas on the same connection
PII redaction — regex scrubbing before the prompt leaves the server
Containerised deployment — reproducible builds, runtime secrets injection, stateless horizontal scaling backed by Redis

None of these are AI-specific. They are the standard disciplines of any public-facing service, applied to an application where the cost of getting them wrong includes token bills, compliance exposure, and model behaviour that is impossible to audit after the fact.

Henry V’s rallying cry — “once more unto the breach” — is about courage in the face of a dangerous opening. The breach here is the gap between a working prototype and a public endpoint. The layers above are what you build before you step through it.

Act I: The 50,000-Character Payload #

Act II: The $40 Stress Test #

Act III: The Email Address in the Prompt #

Act IV: Works on My Machine #

The Production Checklist #

References #