Aquatic Artists custom waterfall
Categories
Flow

AI costs for small business: keep the meter under control

Early April, we noticed something in our usage logs: one background worker had been running a processing loop through the night, making AI calls far faster than it should have. By morning it had burned through a significant chunk of our weekly budget. The rest of the platform was fine, but we were suddenly rationing.

That’s the version of the AI cost problem that doesn’t show up in the demos. Not “AI is expensive to buy” – the tooling is pretty affordable now. The problem is that AI calls are metered like water or electricity, and without the right setup, a small business owner may not know the meter is running until the bill comes.

How AI pricing becomes a metered bill

Most people’s first experience with AI pricing is a subscription like ChatGPT Plus. That is the easy version to understand: you pay a flat monthly price for the ChatGPT web app, then use it inside the limits of that plan. At the time I’m writing this, OpenAI lists ChatGPT Plus at $20/month.

Business automation usually works differently. When software calls an AI model through an API, that API usage is separate from the ChatGPT subscription and is billed independently. The bill is usually based on tokens. A token is a small piece of text. The text you send into the model is input tokens; the answer the model writes back is output tokens. OpenAI publishes API prices per million tokens, with separate rates for input and output.

That difference matters. Typing into ChatGPT feels like using a subscription. Wiring AI into a background worker feels more like turning on a meter. If a job sends a long customer history, a pile of email threads, or a database export, the input tokens can be large before the model writes a single word back. If the response is a long report, the output side grows too.

The problem is volume. If you have an agent that runs nightly and processes a hundred records, that’s a hundred calls. If that agent has a bug and processes the same records ten times each, that’s a thousand calls. If it runs into an error and retries in a tight loop, it can make ten thousand calls before anyone notices. At that point, the affordable AI starts looking different.

Put every AI call through one gateway

The single most useful thing we did was route all AI calls through one gateway – one piece of software that every request passes through before reaching the model. Think of it like a dispatcher at a trucking company. Nobody goes directly to the driver. Every job goes through the dispatcher, who logs it, assigns it, and tracks whether it got done.

With a front door like this, you can see at a glance which parts of your system are making calls and how many. You can set rate limits. You can stop a runaway worker without taking down everything else. You can also apply different rules to different types of work. That’s where the next few dials come in.

Four AI cost controls that actually move the meter

Route simple tasks to cheaper models. Not every job needs the most capable AI available. If you’re classifying whether an incoming email is junk or a real lead, a smaller, faster model costs a fraction of the price and works almost as well. We push classification, labeling, and formatting work to lighter models, and save the heavier ones for things that actually need them: drafting a proposal, generating a response that represents the company, handling a complex phone inquiry.

Batch multiple items per call. If you have twenty emails to screen, sending them one at a time means twenty calls. Sending them together in a single call with the right instructions means one. Not every task batches cleanly, but when it does, the savings are real.

Add circuit breakers. A circuit breaker is a rule that says: if this worker makes more than X calls in Y minutes, stop and alert me. It’s the equivalent of a breaker panel in an electrical system. When something goes wrong, it fails loudly and stops, rather than running quietly until the bill arrives.

The return-on-investment math, honestly

I talk a lot about the value of these tools because the value is real. A $20-a-month AI account can do things that feel ridiculous compared with what software used to cost. But that does not mean AI is automatically cheap once you wire it into a business process.

Here is the kind of math I mean. As of this edit, OpenAI lists GPT-5.5 API pricing at $5 per million input tokens and $30 per million output tokens for standard short-context calls. Suppose a useful daily review job uses 500,000 tokens each run: 360,000 input tokens and 140,000 output tokens. That is $1.80 on the input side and $4.20 on the output side, or $6 per run. Run it twice a day and it is about $12/day. Over a 30-day month, that is $360.

$360/month may be a great deal if the work is worth more than that. If the review saves several hours of office time, catches missed leads, or prevents a real operational problem, the cost is easy to justify. The question is not whether the AI bill is zero. The question is whether the job it is doing is worth more than the meter it runs.

Now flip the example. A similar 500,000-token job on a more expensive model, or one that produces much longer output, can cost a lot more. Using GPT-5.5 Pro rates from the same pricing page, a 500,000-token run split evenly between input and output would be $7.50 for input and $45.00 for output, or $52.50 per run. Run that once a day and you are at $1,575/month. If nobody reads the report, approves the suggestions, or turns the output into a decision, the value is zero. You didn’t buy automation; you bought an expensive pile of unread text.

The honest return-on-investment math is not “AI saves X per month.” It is “AI saves X per month if you run it cleanly and use the output.” A single misbehaving worker, or a perfectly functioning worker that nobody looks at, can flip a sensible budget into an ugly one faster than you would expect. Cost discipline is part of the return, not a separate accounting chore.

What AI budget controls won’t do on their own

Circuit breakers and rate limits don’t configure themselves. Getting value out of a usage dashboard requires someone who understands the system well enough to know which numbers should concern them. For a small shop, that might mean a monthly review of your AI vendor’s billing dashboard, looking for any line that jumped unexpectedly.

Worth naming too: if you’re using AI tools you didn’t build yourself (which is most small businesses), you may not have direct access to the usage data. Ask your vendors. A good AI tool for a small business should show you, at minimum, call volume per month and whether there are any anomalies. If they can’t tell you that, it’s worth asking why.

What I’d do before adding the next AI tool

Before you add any new AI feature to your business, estimate how many times it will run in a day. If it runs once per customer inquiry and you get ten a day, that’s ten calls. If it runs on every incoming email and you get a hundred emails a day, that’s a hundred calls. Write the number down. Then check whether your AI vendor’s pricing makes that sustainable at the volume you’re actually planning for – not just the volume you have today.

That five-minute exercise has saved us from a few decisions that would have looked a lot different at the month-end invoice.

Categories
Flow

How we use AI to review our small business automation

I came in one morning to find that our AI had filed 1,004 improvement tickets about our small business automation while we slept.

That number stopped me cold. A thousand suggestions – about its own performance, its own gaps, things it thought should work better. Most of them were probably fine ideas. None of them were going to get built automatically. That part is entirely up to us.

That’s the setup for this post: not “AI improves itself,” but “AI proposes, humans decide.” The distinction matters more than it sounds if you are using AI tools around real customers, schedules, email, or job records.

What our AI review system actually does

The tool is something we call the Program Improvements Manager, or PIM. Every night, it runs a review pass across the AI systems we’ve built: voice agents, email tools, the CRM integration, the scheduling app. It looks for patterns: things that failed more than once, places where a response was slow, steps that a human had to correct, areas where the output was inconsistent.

Then it writes up suggestions. Each one gets a rough estimate of effort and a rough estimate of impact. They get sorted into categories: reliability improvements, new capability requests, content updates, configuration changes.

By morning, we didn’t have 1,004 raw suggestions. We had 129. That was better, but it was still too many. It was not practical to sift through 129 active improvement suggestions every day and make good decisions about all of them.

That became the real lesson: the system had to be tuned for usability, not just output. A review tool is only useful if the queue is small enough that a person will actually read it. You do not want AI doing a million things if all but a few get stuck in a human review backlog.

That human is me, usually. I review the queue, decide what is worth building, and tune the system when the list starts turning into noise. Some suggestions go into our actual development backlog. Most get noted and watched. A few get dismissed. Nothing happens automatically.

The best use of AI here is not as a magic box for every shiny new idea. It is a tool to reduce work we were already doing. If it overwhelms us with a pile of crap we will never get a chance to look at, it has failed, even if every individual suggestion sounds clever.

Why human approval is the whole point

An AI that can modify its own behavior without oversight is a liability. I’m not being dramatic about that. It’s just true. The value of an AI self-review system like this is not that it fixes things on its own. The value is that it sees things I would miss, at a speed and scale I can’t match, and hands them to me in a form I can actually act on. The judgment call stays with me.

The analogy I keep coming back to is a good estimator who reviews all your past jobs and writes up a report: here’s what came in under budget, here’s what ran over, here’s a pattern I’m noticing. That’s genuinely useful. But the decisions about how to bid the next job, which crew to put on it, when to take a pass – those stay with the person who signs the contracts.

Same idea here.

Nightly backups make the automation safer

The same night the PIM runs, a set of backup jobs runs too. Every customer recording, every voice-agent transcript, every video gets quietly copied to cloud storage. Not because anything bad happened, but because “automated backups run every night” is the kind of boring sentence that prevents years of work from disappearing.

I don’t think about it most days. That’s the point.

A few weeks back I was looking at the backup logs and noticed they’d been running without a single gap for over a month. Thousands of files, dozens of sessions, completely automated, no one watching. That’s the kind of automation I actually trust. Not because it’s clever, but because it’s predictable.

Why AI process improvement is practical now

One reason this is practical now is that the AI tools have gotten cheap enough for normal business experiments. In the videos I’ve made about AI, I keep coming back to the same point: a lot of this is not five-figure enterprise software anymore. Sometimes it is a $20-a-month tool, or a small metered cost, doing a job that would have taken a person hours.

That does not mean everything should be automated. It means the math is finally good enough that a small company can try things, keep what works, and shut off what does not.

This only works if someone reviews the queue

An AI that writes its own improvement tickets is only as good as the person who reads them. If the queue piles up unreviewed for weeks, or if the person reviewing it doesn’t understand the systems well enough to evaluate the suggestions, you get noise. Or worse, a false sense that the system is improving itself when really it’s just generating paperwork.

This approach works for us because I’m close enough to the code and the business logic to make calls quickly. For a shop owner who doesn’t build their own tools, the right version of this might just be a monthly review session with whoever manages your software vendors. Periodic review by a person who can act on it matters more than the specific implementation.

One place to start

If you’re building any kind of automation, build the review step first. Not as an afterthought. Decide ahead of time: how will I know if this is working? How will I know if it’s quietly doing the wrong thing? The answer doesn’t have to be sophisticated. It can be a weekly email summary, a log you glance at on Mondays, a number that should stay below a threshold. Just something a human actually reads.

Categories
Flow

Crew scheduling software for real-time updates

For a long time, the crew schedule lived in a spreadsheet. It worked because everyone knew where to look, but it was still a snapshot of a moving week.

The problem wasn’t the spreadsheet itself. The problem was everything that happened after the spreadsheet changed. A job moved. A crew member was unavailable. The weather pushed a pour to Thursday. Now the version someone saw earlier in the day might be wrong, and the confusion came from people working from different copies of the plan.

If you coordinate crews, you’ve lived this. It doesn’t matter if the schedule starts in a spreadsheet, a group text thread, or one person’s head. The problem is a schedule that changes often but does not update everywhere at once.

What we built for crew scheduling

We stood up a crew scheduling app this month that covers our install crews and job assignments. At its core it’s not complicated: jobs get entered, crew members are in the system with their availability, and the app assigns people to jobs based on who’s free and what skills a job needs.

The “auto-scheduling” part means the app can suggest a crew assignment – it looks at who’s available, who’s already booked, and what each job requires, and it proposes a lineup. You still review it and confirm. It’s not the app making the final call; it’s the app doing the tedious matching work so you’re reacting to a draft instead of building the whole thing from scratch.

That distinction matters. People hear “auto-scheduling” and picture software that runs the whole operation without them. That’s not what I’m describing. The version that actually works for us is: the software handles the obvious assignments, surfaces the conflicts, and leaves the edge cases for human judgment.

I do not put much weight in average numbers for this kind of thing, because every operation runs differently. What I do know is simpler: if the schedule changes frequently and the current version is not visible to everyone who needs it, the business is going to keep paying for that with confusion and missed details.

The spreadsheet sync that made the switch painless

The main lesson from our rollout was simple: don’t throw out the old spreadsheet on day one.

We kept our Excel spreadsheet running alongside the new app for the first few weeks. The two systems sync in both directions – a change in the app shows up in the spreadsheet, and an update in the spreadsheet (for whoever hadn’t switched to the app yet) feeds back into the system.

That two-way sync changed the dynamic of the rollout completely. People who were nervous about the new tool didn’t have to trust it immediately. They could keep doing what they were doing while the rest of us used the app, and the schedule stayed consistent across both. Over time, as the app proved itself, the spreadsheet became the backup rather than the primary. By the time we stopped updating it manually, nobody noticed.

The principle generalizes. Almost any time you’re moving a team off a tool they’re used to, the hard part isn’t the software – it’s the trust. They’ve been burned by systems that launched with fanfare and then broke at the wrong moment. The way to earn that trust is to not ask them to abandon the old system as a condition of trying the new one. Let both run. Let the new one prove itself. Then the transition happens naturally.

What the scheduling app changed day-to-day

The clearest win is that the schedule reflects changes as they happen.

Before, the spreadsheet could be correct at 8 a.m. and stale by lunch. A job moved, availability changed, weather shifted the plan, and people were suddenly working from different versions of the week.

Now the contractor schedule is a shared live view. Crew members can check the current plan themselves. Changes are visible immediately. When a job pushes to the following week, one update propagates everywhere. The value is not that nobody asks questions anymore; it is that everyone starts from current information instead of an older spreadsheet.

Where auto-scheduling still needs a human

Auto-scheduling only knows what you put in it.

If a crew member updates their availability in the app, great – the system handles it. If someone mentions a change and nobody logs it, the schedule is wrong and the app doesn’t know. If a truck breaks down and no one updates the job status, the auto-assign will keep sending that truck out. Garbage in, garbage out – a rule that applies just as much to scheduling software as it does to everything else.

The other thing: scheduling software doesn’t fix a chaotic operation. If your jobs are poorly scoped or crew assignments change last-minute because of upstream planning problems, an app will reflect that chaos clearly. That can be useful – it surfaces problems that were previously invisible – but don’t expect the tool to solve problems that live in the process, not the calendar.

If your crew schedule changes faster than the spreadsheet

You don’t have to rebuild your whole operation to get value out of better scheduling tools. The useful first step is simple: make the current schedule visible to the people who need it, and make updates flow to one shared place quickly.

Start with two questions. Where is the current version of the schedule, and what happens when jobs or crew assignments change during the day? If the answer depends on someone remembering to reconcile a spreadsheet later, you’ve found the friction point. Even a basic app that lets more people see the latest schedule can reduce confusion before you add anything more advanced.

The spreadsheet got us by for a long time. I don’t miss trying to keep every moving part current by hand.