Vibe Coding is not a production strategy: Keeping AI code under control

Vibe coding is what happens when you treat your AI assistant like an English-speaking compiler. You describe what you want to an AI assistant, accept whatever looks plausible and move on. You do not bother with much design, you skim the diff, tests are “green enough”.

OpenAI’s own Developer Experience team has started to push back on this style of writing software. In a recent Ctrl Alt Lead podcast episode Katia Gil Guzman, one of the founding members of the team, warned that “fast-and-loose” prompting might be fine for hobby projects but is dangerous for enterprises. Her argument is simple: when casual prompts drive code generation, AI starts making architectural and security decisions on your behalf. Your systems gradually turn into black boxes instead of the AI behaving like a structured teammate that follows clear rules.

This article is about the moment vibe coding stops being a fun way to hack a weekend project and starts touching your production stack. Short version: you can absolutely use AI in your workflow, but treating “vibes” as a development model for real systems is a bad idea.

What vibe coding actually looks like

Forget marketing language for a second. In a real team, vibe coding usually starts with a dev on a deadline.

They sit in Cursor or Copilot, open the repo, then type something like: “Build an admin dashboard so support can reset passwords, view user details, block accounts, using our existing auth and user service.” The assistant scaffolds a Next.js app, slaps in some routes, calls your APIs, maybe even wires up a basic login page. The build passes and the happy path runs clean, so it feels safe to ship.

From the outside it feels efficient. You barely touched the keyboard. The diff is big but not insane. You read a couple of key files, nod, commit, deploy. Support gets a link to the new admin panel, they are happy, you are a hero.

On that day, vibe coding feels like cheating in a good way. The risk is invisible because nothing has gone wrong yet.

The admin panel that was “internal”

That same “quick win” admin panel starts life as a throwaway internal tool. You deploy it behind a simple /admin path on your main app or on some admin.yourcompany.internal style subdomain. Auth is whatever the AI assistant picked up from nearby code. Maybe a hand-rolled JWT guard or a middleware that only checks a single role flag.

Nobody writes a spec for this thing. There is no documented threat model. No clear statement about who can access what or from where. The only “design” lives in the prompt history and the generated components.

Fast forward six months. The link to this panel has moved from a forgotten internal wiki into the support portal because it was “useful” and nobody saw a reason to treat it differently. Somewhere along the way an ingress change widened who can reach that host, everything kept working so nobody looked twice.

Now you have a panel that talks directly to production APIs, runs with powerful credentials and has accumulated a bunch of handlers that nobody fully understands. You might still believe it is “internal only”. The internet disagrees.

This is the core problem. Vibe coded components usually lack two basics and a safety check:

an explicit spec
a named owner who is responsible for it over time
a real review of data flows and trust boundaries

OpenAI’s warning about vibe coding is basically a polite way to say: you are letting AI make these decisions for you while your governance lags behind.

Why this keeps going wrong even when you try to be careful

You can tell the model to “write secure code” till your keyboard breaks. It will still not behave like a senior engineer who understands your system.

There are two structural reasons for that.

The model has no real concept of your architecture

At best the assistant sees the files you select plus some repo context. It does not understand that this admin panel sits on the same domain as customer traffic, that its host is reachable from outside your VPN or that your regulator expects audit logs for every password reset.

So it happily:

reuses weak patterns it finds locally
mixes infrastructure concerns into feature code
wires endpoints together in ways that look fine from the file’s point of view but break your trust boundaries

In the admin panel story, that turns into routes that skip proper role checks or rely on cookies intended for a different context.

Fast code, weak oversight

With AI in the loop, implementation time collapses. Work that would have taken a week of design, coding and review now appears as a working branch after an afternoon of prompting. Most teams never update their review or security process to match that pace, so more and more code “looks fine” but has never had a serious check.

As output accelerates, pull requests get larger or land far more often (or both). Reviewers stop following the behavior end to end and focus on obvious breakage or style. The boring rules that should always hold (like who is allowed to call an endpoint or which data can cross a boundary) get less attention.

In that environment the admin panel gets waved through since nothing explodes on day one.

The same dynamic quietly pushes dependencies and configuration away from your standards. The assistant suggests whatever libraries it has seen in training. If those happen to match your stack you get lucky. If not, you get a slow creep of extra auth helpers, new HTTP clients, ad-hoc config patterns.

Over time the app starts to feel like a black box even to maintainers, since no one can clearly explain why certain packages or settings exist.

You can keep adding prompts that say “use our standard X” but they do not fix the basic problem. The model does not know the rules of your house until you write them down in a form a human can defend.

So what do you do instead?

The answer is definitely not to “ban AI” in the development cycle: teams will keep using it anyway. The move is to stop treating vibes as a development model for production paths and switch to workflows where humans define the rules then let AI work inside them.

For the admin panel example, the difference is simple. In the bad version, the only “design” lives inside a prompt and a handful of generated files. In a sane version, someone writes down what this component is allowed to do, which data it touches, who can use it, and how it is supposed to fit into the rest of your system. That spec becomes the contract the assistant has to follow and the reference reviewers use when they decide whether to ship or not.

If you want concrete patterns for this kind of workflow, this article on BMAD walks through approaches that all start from the same idea: capture decisions, describe architecture in a way AI can actually consume, and treat code as the cheap part.

Boiled down, you need a small set of written rules. What an admin panel is allowed to do, which auth patterns are acceptable, which libraries are in bounds. Once those are clear, AI has something concrete to follow and humans have something concrete to review against.

If you give AI that structure and keep human review focused on whether the code follows your own rules, you keep most of the speed without leaving production in the hands of a black box.

Where vibe coding actually belongs

By now the pattern is pretty clear: vibe coding is useful when you are exploring, spiking ideas or wiring up tools you are happy to throw away.

It is a bad default for anything you plan to maintain or plug into the rest of your system, which is why even OpenAI’s own DX folks are now warning against it for serious work.

If you are running an AI native team, the practical stance is straightforward:

use vibe coding for spikes, experiments and small utilities you can delete without regret
for anything that is meant to live in your stack, start from a small spec with clear ownership, then treat the AI output as untrusted until a human with enough context has reviewed it

The admin panel is just one example. You can swap in a new billing service or a data export pipeline if you want even more, very real, nightmares. The underlying pattern does not change.

You still get the speed and convenience of AI in your stack. You just stop pretending that vibes on their own are a valid development model once the code is something you intend to rely on.