Fallback Logic: When OpenAI Fails, Ollama Takes the Mic

Jun 15, 2025

I’ve become pretty reliant on OpenAI’s API for my projects, but one thing I’ve learned the hard way: always have a Plan B. Not long ago, I was testing Coach (my AI assistant) late at night when—boom—OpenAI’s service went down or I hit a usage cap. Suddenly, my “AI support system” was a silent brick. That didn’t sit well with me, so I got to work on a fallback logic. Now, when OpenAI fails, Ollama (a local AI model runner) steps in and takes the mic.

Here’s how it works: if a request to the OpenAI API times out or comes back empty, my system automatically spins up a local large-language model via Ollama. It’s basically an AI model that runs on my own machine (no internet required once it’s downloaded). The first time I set this up, it felt like rigging up a generator for when the power goes out. The local model isn’t as fancy or powerful as OpenAI’s latest and greatest, but it’s reliable and always there. I even configured it to keep the model “warm” for a couple of hours once activated, so it’s ready to answer follow-up questions without a long load time.

Building this fallback taught me a lot about resilience. We talk about graceful degradation in system design, and I got to implement it in a very tangible way. When the cloud AI fails, conversations don’t just dead-end anymore; they just switch voices. As a user, all I might notice is a slightly slower response and maybe a shift in tone, but the interaction continues. And emotionally, it’s a relief. I no longer have to tiptoe around using my AI out of fear that I’ll run out of API credits or get throttled.

This experience reinforced something simple: expect things to break. Design for failure. Now that I’ve seen my backup AI save the day, I’m thinking of other single-points-of-failure I can shore up—in my code and even in my life. After all, a good Plan B is worth its weight in gold (or in this case, saved sanity).