Aquatic Artists custom waterfall
Categories
Flow

Website monitoring found five false alarms and one real bug

The website monitor went off at 6 a.m. Something was wrong with one of our sites. I pulled up the alert, looked at the flagged section, and everything looked fine to me. I closed the laptop and went back to my coffee.

That happened six times over two weeks. Five of those times, I was right to dismiss it. The sixth time, I was wrong – and a real bug had been sitting there, visible to customers, for longer than I’d like to admit.

When website monitoring alerts cry wolf

We run automated website monitoring that checks our sites regularly and flags anything that looks off. It compares page sections against a baseline – layout, content, key elements. When something drifts far enough from the baseline, it fires an alert.

The problem is that not every change is a problem. A font loads slightly differently on a Tuesday. An image swaps out because we updated the gallery. The comparison tool sees the difference and sends the alarm. Five of our six recent alerts fell into that category: legitimate changes the monitor didn’t recognize as intentional.

The issue with that pattern is obvious. Once alerts fire often enough for non-problems, you start dismissing them faster. Your response time to a real issue quietly gets worse.

So we dug into every one of those six alerts properly, traced each one back to its actual root cause, and fixed the detection logic where it was overly sensitive. That part is maintenance work. Not exciting. Worth doing.

The real bug: a project gallery failed on first load

The sixth alert pointed to a project gallery section – the part of a page that shows photos of completed work. When I loaded the page and clicked around, it looked fine. Images loaded, captions appeared, everything worked.

What I hadn’t done was run a first-load website test: load the page fresh and just wait.

A customer who lands on that page for the first time, scrolls to the gallery section, and doesn’t immediately click on anything – they would have seen a blank space. The gallery wasn’t rendering on initial page load. It only appeared after some interaction triggered a refresh.

We’d been testing from inside the system, navigating page to page while logged in. The bug only appeared when you came in cold, the way a real visitor does. Once we loaded pages fresh from outside, the issue was obvious. Fix took less than an hour once we understood it.

The lesson: test from the outside. Your workflow and your customers’ workflow are not the same path.

Backup verification has to check the destination

During this same debugging window, we found something unrelated but equally worth writing about.

Our automated backups were running. They were completing. They reported success. But we hadn’t looked closely at where the files were actually landing. When we checked, they’d been routing to the wrong cloud account – one we had access to, technically, but not the right destination for a real recovery scenario.

For weeks, our backups were “working” – no error ever appeared. But if we’d needed to restore something, we’d have been digging in the wrong place under pressure.

The rule we wrote after this: backups must fail in a way you notice, not succeed in a way you never verify. An automated process that completes quietly and routes output somewhere wrong is worse than one that breaks loudly. A loud failure gets fixed. A silent mismatch can sit there for months.

Now our backup jobs verify destination and file count as part of their completion check. If the numbers don’t match expectations, the alert goes to a human. Not a log entry. A human.

Verify backups before website upgrades

In the same week, we upgraded WordPress to version 7.0. New security patches, compatibility updates, the usual.

The rule before any significant website upgrade is simple: confirm the backup exists and that you can actually restore from it, then proceed. Not “I think the backup ran.” Confirm it. Pull a recent file from it.

The upgrade went fine. But that discipline is what makes an upgrade something you do on a Tuesday afternoon instead of scheduling a Saturday for it. A verified backup means a bad upgrade is a two-hour recovery, not a crisis.

Using an AI coordinator without giving it the keys

One more thing worth explaining: we started using an AI coordinator we call “jefe.” The name is intentional. Jefe’s job is to direct, not to do.

When we have a complex task, jefe breaks it into pieces, assigns each piece to a specialized AI assistant, and collects the results. It never edits a file or makes a change directly. It only coordinates.

I built it this way because I watched what went wrong when a single AI assistant tried to handle too many things at once: mixed-up context, lost steps, fixes from one area getting applied to problems in another. One coordinator plus focused specialists plus a human reviewing the output works noticeably better than one assistant trying to do everything.

The structure matters more to me than whichever model happens to be newest that week: a director that does not touch the work, specialists that stay in their lane, and a human who checks before anything ships. That is how I keep AI useful without letting it run the business by itself.

Tuning monitoring alerts lowers noise but adds risk

Jefe can delegate to the wrong specialist, or two specialists can return conflicting answers. The human review step isn’t optional or a formality. More than once, the consolidated report has included a finding that was just wrong, and a human caught it.

The monitoring tuning is also not a permanent fix. Reducing false positives means the thresholds are less sensitive, which means a real issue might take slightly longer to trigger. We accepted that tradeoff deliberately and will revisit it if something slips through.

Check one automated business process this week

Pick one automated process in your business this week – a backup job, a scheduled report, a recurring email – and check the actual output, not just whether it ran. Did the backup land where you expected? Does the file open? Is the report showing the right data?

This takes about five minutes. It has saved us more than once.

Categories
Flow

How we use AI to review our small business automation

I came in one morning to find that our AI had filed 1,004 improvement tickets about our small business automation while we slept.

That number stopped me cold. A thousand suggestions – about its own performance, its own gaps, things it thought should work better. Most of them were probably fine ideas. None of them were going to get built automatically. That part is entirely up to us.

That’s the setup for this post: not “AI improves itself,” but “AI proposes, humans decide.” The distinction matters more than it sounds if you are using AI tools around real customers, schedules, email, or job records.

What our AI review system actually does

The tool is something we call the Program Improvements Manager, or PIM. Every night, it runs a review pass across the AI systems we’ve built: voice agents, email tools, the CRM integration, the scheduling app. It looks for patterns: things that failed more than once, places where a response was slow, steps that a human had to correct, areas where the output was inconsistent.

Then it writes up suggestions. Each one gets a rough estimate of effort and a rough estimate of impact. They get sorted into categories: reliability improvements, new capability requests, content updates, configuration changes.

By morning, we didn’t have 1,004 raw suggestions. We had 129. That was better, but it was still too many. It was not practical to sift through 129 active improvement suggestions every day and make good decisions about all of them.

That became the real lesson: the system had to be tuned for usability, not just output. A review tool is only useful if the queue is small enough that a person will actually read it. You do not want AI doing a million things if all but a few get stuck in a human review backlog.

That human is me, usually. I review the queue, decide what is worth building, and tune the system when the list starts turning into noise. Some suggestions go into our actual development backlog. Most get noted and watched. A few get dismissed. Nothing happens automatically.

The best use of AI here is not as a magic box for every shiny new idea. It is a tool to reduce work we were already doing. If it overwhelms us with a pile of crap we will never get a chance to look at, it has failed, even if every individual suggestion sounds clever.

Why human approval is the whole point

An AI that can modify its own behavior without oversight is a liability. I’m not being dramatic about that. It’s just true. The value of an AI self-review system like this is not that it fixes things on its own. The value is that it sees things I would miss, at a speed and scale I can’t match, and hands them to me in a form I can actually act on. The judgment call stays with me.

The analogy I keep coming back to is a good estimator who reviews all your past jobs and writes up a report: here’s what came in under budget, here’s what ran over, here’s a pattern I’m noticing. That’s genuinely useful. But the decisions about how to bid the next job, which crew to put on it, when to take a pass – those stay with the person who signs the contracts.

Same idea here.

Nightly backups make the automation safer

The same night the PIM runs, a set of backup jobs runs too. Every customer recording, every voice-agent transcript, every video gets quietly copied to cloud storage. Not because anything bad happened, but because “automated backups run every night” is the kind of boring sentence that prevents years of work from disappearing.

I don’t think about it most days. That’s the point.

A few weeks back I was looking at the backup logs and noticed they’d been running without a single gap for over a month. Thousands of files, dozens of sessions, completely automated, no one watching. That’s the kind of automation I actually trust. Not because it’s clever, but because it’s predictable.

Why AI process improvement is practical now

One reason this is practical now is that the AI tools have gotten cheap enough for normal business experiments. In the videos I’ve made about AI, I keep coming back to the same point: a lot of this is not five-figure enterprise software anymore. Sometimes it is a $20-a-month tool, or a small metered cost, doing a job that would have taken a person hours.

That does not mean everything should be automated. It means the math is finally good enough that a small company can try things, keep what works, and shut off what does not.

This only works if someone reviews the queue

An AI that writes its own improvement tickets is only as good as the person who reads them. If the queue piles up unreviewed for weeks, or if the person reviewing it doesn’t understand the systems well enough to evaluate the suggestions, you get noise. Or worse, a false sense that the system is improving itself when really it’s just generating paperwork.

This approach works for us because I’m close enough to the code and the business logic to make calls quickly. For a shop owner who doesn’t build their own tools, the right version of this might just be a monthly review session with whoever manages your software vendors. Periodic review by a person who can act on it matters more than the specific implementation.

One place to start

If you’re building any kind of automation, build the review step first. Not as an afterthought. Decide ahead of time: how will I know if this is working? How will I know if it’s quietly doing the wrong thing? The answer doesn’t have to be sophisticated. It can be a weekly email summary, a log you glance at on Mondays, a number that should stay below a threshold. Just something a human actually reads.