Build a Fallback Model Before You Need One: A Copy-Paste Pattern So One Provider Outage Never Stops Your Agent

It happened todayThe morning that proved nobody is immune

On the morning of 2026-06-15, per status.claude.com, an incident titled "Claude Opus 4.8 — Elevated errors" ran roughly 06:20–08:56 UTC and is now resolved. For about two and a half hours, calls to that model could come back as errors. If you were chatting in an app, you shrugged and retried. But if you had an agent — a loop calling that one model with no alternative — that was ~2.5 hours of dead automation: jobs failing, queues backing up, customers staring at a spinner, and you finding out from a support ticket instead of a dashboard. This is the only incident I'll state as fact here, and I'm attributing it on purpose. The point isn't that one model had a bad morning. The point is that every model, on every provider, eventually will — and the only question that matters is whether your agent notices and routes around it, or just stops.

Resilience isn't a knock on any provider. The best-engineered API on earth still publishes a status page, because outages are a when, not an if. The teams that stay up aren't the ones who picked the perfect provider — they're the ones who assumed their provider would fail and built for it before they needed to.

Name the risk: SPOFA single model is a single point of failure

Here's the uncomfortable architecture truth: the quality of your model has nothing to do with the availability of your agent. You can pick the smartest model in the world, and if it's the only thing your loop can call, then its uptime is your uptime. One endpoint, one set of credentials, one region, one rate-limit pool — every one of those is a wire that, when cut, takes your whole automation down with it. That's a single point of failure (SPOF), and it's an availability risk regardless of how good the model is.

One endpoint = one failure domain. A 5xx, a timeout, a regional blip, a rate-limit spike, an expired key — any single one stops a single-model agent cold.
Interactive use hides it. A human in the loop retries, switches tabs, comes back later. An unattended agent loop has no human to improvise — it just throws and dies.
Recurring + unattended = highest blast radius. A scheduled job or a webhook-driven loop can fail silently every fire for hours before anyone notices.
Quality doesn't buy availability. "Best model" and "always available" are different axes. You want both, and only one of them is something you architect.

The fix isn't a better single model. It's making sure no single model can take you down — which means having somewhere to fall back to before the morning you need it.

Vendor-neutral, evergreenThe pattern: try a primary, fall back to a secondary

The fallback-model pattern is deliberately boring, which is why it works. You define a primary model and one or more fallbacks (ideally on a different provider, so a single provider's outage can't take out both). On every call you try the primary; if it returns a retryable failure — a 5xx, a timeout, or your circuit breaker is open — you fall back to the secondary so the loop keeps running. When the primary recovers, you prefer it again. That's the whole idea: no single provider is ever the only thing standing between your agent and a response.

Order your providers. Primary (your preferred model) → fallback 1 (a different provider, so one vendor's outage doesn't sink both) → optional fallback 2. Different provider matters more than different model.
Classify failures. Retryable/failover-worthy: timeouts, connection errors, 429s, 5xx, circuit-open. NOT failover-worthy: 400s/422s (bad request — same on every provider) and auth 401s for that key. Don't fail over a bug.
Retry the same provider briefly, THEN fail over. A quick retry with backoff absorbs a transient blip; if it keeps failing, move to the next provider rather than hammering a sick endpoint.
Wrap each provider in a circuit breaker. After N consecutive failures, mark it "open" and skip it entirely for a cool-down window — so you stop wasting time (and timeouts) on a provider that's clearly down.
Prefer the primary again after cool-down. When the breaker half-opens, send a trial request; if it succeeds, close it and route back to your preferred model. Failover should be temporary, not permanent.

Keep it vendor-neutral on purpose. The pattern shouldn't know or care which providers you use — it just knows "try the list in order, skip the ones that are failing, prefer the top of the list when it's healthy." That's what lets you swap providers later without rewriting your agent.

This is the reward — take itWire it into your agent loop (copy-paste)

Here is the pattern as something you can paste in today. First the language-agnostic shape, then a concrete example using a generic OpenAI-compatible client with two base URLs / two keys — because most providers expose an OpenAI-compatible endpoint, this same code points at almost any of them by changing a base URL. Retry-with-backoff and a tiny circuit breaker are built in; nothing here is tied to a single vendor.

Pseudocode — the shape of it:

providers = [primary, fallback1, fallback2]   # ordered; mix vendors

function complete(request):
  for p in providers:
    if breaker[p].is_open():         # skip providers we know are down
      continue
    for attempt in 1..MAX_RETRIES:
      try:
        resp = p.call(request, timeout=T)
        breaker[p].record_success()
        return resp                  # first healthy provider wins
      catch err:
        if not is_retryable(err):    # 400/422/401: don't fail over, it's a bug
          raise err
        breaker[p].record_failure()
        sleep(backoff(attempt))      # e.g. 0.5s, 1s, 2s + jitter
    # this provider exhausted its retries: fall through to the next provider
  raise AllProvidersFailed           # every provider down: alert + degrade gracefully

Python — generic OpenAI-compatible client, two providers:

import time, random
from openai import OpenAI  # any OpenAI-compatible SDK

PROVIDERS = [
    {"name": "primary",  "client": OpenAI(base_url=PRIMARY_URL,  api_key=PRIMARY_KEY),  "model": PRIMARY_MODEL},
    {"name": "fallback", "client": OpenAI(base_url=FALLBACK_URL, api_key=FALLBACK_KEY), "model": FALLBACK_MODEL},
]
MAX_RETRIES, TIMEOUT = 2, 30
_fails, _open_until = {}, {}

def _open(name):  # simple circuit breaker
    return time.time() < _open_until.get(name, 0)

def _retryable(e):
    code = getattr(e, "status_code", None)
    return code is None or code == 429 or code >= 500  # timeout/conn or 5xx: retry; other 4xx: don't

def complete(messages):
    for p in PROVIDERS:
        if _open(p["name"]):
            continue
        for attempt in range(MAX_RETRIES + 1):
            try:
                r = p["client"].chat.completions.create(
                    model=p["model"], messages=messages, timeout=TIMEOUT)
                _fails[p["name"]] = 0  # healthy again
                return r
            except Exception as e:
                if not _retryable(e):
                    raise  # bad request / auth: same on every provider, don't fail over
                _fails[p["name"]] = _fails.get(p["name"], 0) + 1
                if _fails[p["name"]] >= 3:
                    _open_until[p["name"]] = time.time() + 60  # cool down 60s, prefer primary after
                time.sleep(min(pow(2, attempt), 8) + random.random())  # backoff + jitter
    raise RuntimeError("all providers failed")  # alert here

TypeScript — same idea, ordered list + failover:

import OpenAI from "openai";
const PROVIDERS = [
  { name: "primary",  client: new OpenAI({ baseURL: PRIMARY_URL,  apiKey: PRIMARY_KEY }),  model: PRIMARY_MODEL },
  { name: "fallback", client: new OpenAI({ baseURL: FALLBACK_URL, apiKey: FALLBACK_KEY }), model: FALLBACK_MODEL },
];
const openUntil: Record<string, number> = {}, fails: Record<string, number> = {};
const retryable = (s?: number) => s === undefined || s === 429 || s >= 500;

export async function complete(messages: any[]) {
  for (const p of PROVIDERS) {
    if (Date.now() < (openUntil[p.name] ?? 0)) continue;       // breaker open: skip
    for (let attempt = 0; attempt <= 2; attempt++) {
      try {
        const r = await p.client.chat.completions.create({ model: p.model, messages });
        fails[p.name] = 0; return r;                            // first healthy provider wins
      } catch (e: any) {
        if (!retryable(e?.status)) throw e;                     // other 4xx: real bug, don't fail over
        fails[p.name] = (fails[p.name] ?? 0) + 1;
        if (fails[p.name] >= 3) openUntil[p.name] = Date.now() + 60000; // cool down, prefer primary after
        await new Promise(r => setTimeout(r, Math.min(Math.pow(2, attempt), 8)  1000 + Math.random()  1000));
      }
    }
  }
  throw new Error("all providers failed");                      // alert + degrade gracefully
}

Health-check / prefer-primary-again note. The breaker's cool-down is your health check: when the window expires you try the primary again first, so failover is temporary. If you want eager recovery, run a tiny periodic background ping to the primary and close its breaker early the moment it answers — so you spend as little time as possible on the fallback.

Two honest caveats. (1) The retryable/non-retryable split is the part to get right — failing over on a 400 just runs your bug twice. (2) Keep the call shape identical across providers (same messages, same tools schema) so the fallback is a drop-in; if a provider needs a different shape, normalise it behind the provider object, not in your loop.

Five minutes, onceThe resilience checklist

Run down this list before you call an agent production-ready. None of it is exotic; all of it is the difference between a two-and-a-half-hour outage you slept through and a blip your users never noticed.

Timeouts are set on every call. No unbounded waits — a hung request is an outage with no error. Pick a timeout shorter than your users' patience.
A secondary provider's keys are provisioned NOW. Not "when we need it." The key, the base URL, and a smoke-test call all working before the outage. You can't onboard a provider during one.
A circuit breaker per provider. Stop hammering a provider that's clearly down; skip it for a cool-down, then re-test. Prefer the primary again when it's healthy.
Alerting on failover + on all-providers-failed. You should learn from a notification, not a customer. Alert when you start using the fallback (early warning) and page when everything fails.
A periodic failover drill. Force the primary to fail in staging (bad key / blocked URL) and confirm the agent rides the fallback. An untested fallback is just a comment in your code.
Graceful degradation for total failure. When every provider is down, fail in a way that's safe — queue and retry, return a clear message, don't lose the work. Down is bad; losing data is worse.

The whole job is small and it's a one-time cost. Build the fallback on a calm afternoon, drill it once, and you've turned every future provider outage from an incident into a footnote.

Get the next drop

New AI build guides + the occasional bonus template. No spam, unsubscribe anytime.

By submitting you agree to our Privacy Policy & Terms. Unsubscribe anytime.

Frequently asked questions

Does adding a fallback double my cost?

No. You only call the fallback when the primary is failing — which is rare. In normal operation every request goes to your primary and the fallback costs nothing. The only ongoing cost is provisioning a second provider's keys (usually free until used) and a tiny optional health-check ping. You're paying near-zero for insurance that turns a multi-hour outage into a non-event.

How do I keep outputs consistent across two different models?

Keep the call shape identical (same system prompt, same messages, same tool/JSON schema) so the fallback is a drop-in. Constrain the output: ask for structured JSON and validate it the same way regardless of which model answered. Accept that a fallback response may be slightly different in style — during an outage, a slightly-different correct answer beats no answer. For anything strict, validate-and-repair the output rather than trusting either model blindly.

What about streaming responses?

Stream from whichever provider you've selected, but only commit to a provider once the stream actually starts. If the connection fails before the first token, fall over and start the stream fresh on the next provider. If it fails mid-stream, you generally restart the request on the fallback rather than trying to resume — so make the consuming side idempotent (don't act on partial output until the stream completes cleanly).

How do I actually test that failover works?

Force the failure on purpose. In staging, point the primary at a bad base URL or an invalid key, or use a mock that returns 503s, and confirm the agent rides the fallback and still completes. Then test recovery: restore the primary and confirm the breaker closes and routing returns to it. Put this drill on a schedule — an untested fallback path is the one that fails when you finally need it.

Should the fallback be a smaller or cheaper model?

It can be, and often should be on a different provider so a single vendor outage can't take out both. A capable-enough fallback that's available beats a perfect one that's down. The trade-off is your call: some teams fall back to a comparable model on another provider (consistency), others to a cheaper/smaller one (cost) and accept slightly lower quality during the outage window. Either is fine as long as the fallback can actually complete the task.

When should I fail over versus just retry the same provider?

Retry the same provider for a brief transient blip (a single timeout or 429) using a couple of attempts with backoff — most blips clear in milliseconds. Fail over to the next provider when retries are exhausted or the circuit breaker is open, i.e. the provider is sustainably unhealthy, not just briefly busy. Never fail over on a 400/422/401: those are bugs or auth problems that will fail identically on every provider.

Won't a circuit breaker make things worse if it trips wrongly?

Only if it's tuned badly. Set the failure threshold high enough that a single blip doesn't trip it (e.g. 3+ consecutive failures), keep the cool-down short (tens of seconds), and use a half-open trial request to re-test before fully closing. Tuned that way, the breaker only skips a provider that's genuinely down and quickly returns to it once it recovers — it reduces wasted timeouts, it doesn't cause outages.

Where should this live — in my app or in front of it?

Either works. In-app (the pattern in this guide) is the fastest to ship and keeps control in your code. A gateway/proxy in front of your app centralises failover for many services and keeps provider logic out of every codebase. Start in-app to get protected today; graduate to a shared gateway when more than one service needs the same resilience.

Sources · Claude Opus 4.8 — Elevated errors (incident, 2026-06-15, resolved) — Anthropic / Claude status page · Circuit Breaker pattern — Martin Fowler (the canonical write-up) · Exponential backoff and jitter — AWS Architecture Blog · OpenAI-compatible API surface (why one client can point at many providers) — OpenAI API reference

Build a Fallback Model Before You Need One: A Copy-Paste Pattern So One Provider Outage Never Stops Your Agent

It happened todayThe morning that proved nobody is immune

Name the risk: SPOFA single model is a single point of failure

Vendor-neutral, evergreenThe pattern: try a primary, fall back to a secondary

This is the reward — take itWire it into your agent loop (copy-paste)

Five minutes, onceThe resilience checklist

Get the next drop

Frequently asked questions

If you're going to run AI for others, not just yourself

Ready to run AI for others? Grab the AI Reseller Starter Kit

Build a Fallback Model Before You Need One: A Copy-Paste Pattern So One Provider Outage Never Stops Your Agent

It happened todayThe morning that proved nobody is immune

Name the risk: SPOFA single model is a single point of failure

Vendor-neutral, evergreenThe pattern: try a primary, fall back to a secondary

This is the reward — take itWire it into your agent loop (copy-paste)

Five minutes, onceThe resilience checklist

Get the next drop

Frequently asked questions

More free guides

If you're going to run AI for others, not just yourself

Ready to run AI for others? Grab the AI Reseller Starter Kit