OperationsApr 16, 2026

The First 30 Days After AI Goes Live

Launch day is the easy part. The model runs. The dashboard renders. The stakeholders clap. Somebody posts a screenshot to the internal channel. The contract gets referenced in a quarterly review.

Thirty days later is where systems actually live or die.

The first month in production is a different job than the project that got you there. It is not build work. It is not training. It is the part nobody writes a statement of work for, and it is where most AI investments quietly stop being used.

Week one: nothing breaks, and that is the problem

The first week after go-live feels like a win. Traffic is low. The team that championed the project is the team using it. Output looks right because the people reviewing it already expected it to look right.

The trap is that nothing unusual happens, so nobody learns anything. We push to get actual volume through the system in week one, even if that means assigning it real work instead of letting it idle. The goal is not proving it works. The goal is finding the first set of things that do not.

Week two: the edge cases arrive

Real users hit real data. Within two weeks, three things surface on every engagement we have ever shipped.

Inputs the test set never contained. A record format from a source system nobody flagged. A free-text field full of values the model has no prior for. A document type that was always supposed to be deprecated but still gets generated somewhere.

Disagreements between users and model. The model is right. The user is right. The labels were ambiguous. Usually some combination. These disagreements are the most valuable signal in the first 30 days. They are free labeled data on exactly the cases that matter most, and they are the thing teams are most tempted to ignore because they are uncomfortable.

Workflow friction that did not show up in user testing. Something takes one click too many. A field gets entered twice. A result lands in an inbox instead of a queue. Small things, but they compound until the user routes around the system.

Week three: the retraining question

By week three there is usually enough production signal to answer a real question: is the model performing on real data the way it performed on test data? If not, why not?

The honest answer is almost never "the model is bad." It is usually one of three things. The input distribution shifted in a way the test set did not capture. The labeling convention on new data drifted from the convention on training data. Or the workflow is surfacing edge cases that were rare enough to be rounded out of the original sample.

Each of those has a different fix. None of them is solved by swapping to a bigger model. The teams that skip this diagnosis and go straight to "upgrade the model" are the ones that end up with a more expensive system that has the same problem.

Week four: the ownership transfer

By week four, the system should belong to the operating team, not the build team. That transfer is where most projects fail quietly. Not because the handoff was skipped, but because the handoff was ceremonial.

A real transfer is not a deck. It is the operating team running a retrain, pushing a config change, interpreting a dashboard, and resolving a ticket, without the build team touching the keyboard. If the operating team cannot do those four things by day 30, the system has no long-term owner. It has a dependency. Dependencies get sunset.

What we do different

Every engagement we run includes 30 days of post-launch support. Not as a favor. As a structural part of the project. Because we have watched enough systems die in the first 30 days to know that cutting support at launch is the single most effective way to waste the 90 days that came before.

We sit in the operating team's channel. We watch the dashboards with them. We handle the first retraining cycle alongside whoever is going to handle every retraining cycle after us. We write down the things that surprise us and the things that surprise them, because those lists are the beginning of a real runbook.

When we leave on day 30, the system has been through a full operating cycle with the people who have to run it. Not a simulated one. A real one, under load, with real edge cases that have already been resolved once.

The short version

The risk on an AI project does not peak at launch. It peaks three weeks after launch, when the first serious edge cases arrive and the build team is no longer in the room. Plan for that window or lose the system in it.