Garbage In, Garbage Out: A 30-Minute Data-Quality Check Before Any AI Project

12 June 2026Brett Alegre-Wood5 min read

AI data qualitydata governanceSME AI adoptionAI readinessclean data

Listen to this article0:00 / 5:56

Two AI hosts discuss this article. Generated from the text.Download

TL;DR

Before you spend a penny on an AI project, run a 30-minute check on the data it will use. If the data is messy, AI doesn't fix it, it copies the mess faster and more confidently. A quick look at completeness, consistency, duplicates and freshness tells you whether to build now, clean first, or pick a different starting point.

Why does garbage in mean garbage out?

Because AI doesn't judge your data, it copies it. If half your customer records have no phone number, a follow-up agent can't follow up. If the same supplier appears three times under three spellings, your reporting will be wrong three different ways. The model isn't broken. It's doing exactly what it was told, on the information it was given.

This is the part the hype skips. Everyone talks about clever models and what they can do. Almost nobody talks about the boring stuff feeding them. And the boring stuff is where projects quietly fail.

I've watched it in my own businesses. At Darra Tyres, if the stock list says we have something we don't, no AI on earth turns that into a happy customer. At EzyTrac, a tenant record with the wrong email means every automated message goes into a void. The AI worked perfectly. The data let it down.

So before any build, I want to know one thing: is the data good enough that AI will augment the team, or bad enough that it'll just embarrass us at speed?

What exactly should you check in 30 minutes?

Four things: completeness, consistency, duplicates and freshness. You don't need a data scientist. You need the person who knows the business, a coffee, and half an hour.

Pick the one process you want AI to help with first. Just one. Maybe it's chasing quotes, or answering common customer questions, or flagging overdue invoices. Then find the data that process actually touches, usually one spreadsheet, one CRM export, or one folder.

Now go looking, by eye, for these four:

Completeness. Open the file and scan the columns that matter for this job. If you're automating follow-ups, how many rows have a missing email or phone number? Roughly what share is blank? You're not counting every cell, you're getting a feel. A few gaps is normal. Half the column empty is a warning.

Consistency. Is the same thing written the same way every time? "Ltd", "Limited" and "LTD." Dates as 03/04 in some rows and April in others. Statuses like "Paid", "paid" and "PAID-thanks". Humans read past this without noticing. AI treats each version as a different thing.

Duplicates. Is the same customer, order or supplier in there more than once? Two records for one person means two follow-ups, double-counted revenue, and a customer who thinks you don't know who they are.

Freshness. When was this last updated? A contact list nobody has touched in two years isn't a contact list, it's a history book. If the data is stale, AI will confidently act on things that are no longer true.

Start here

See where AI fits in your business. Free.

A 45-minute audit. We map the highest-value automations and what they're worth in time and money. No pitch, no pressure.

How do you score it without overthinking?

Give each of the four a simple traffic light: green, amber or red. No spreadsheets, no formulas. Trust your read.

Green means it's mostly fine for this one job. Amber means there's a noticeable problem you can fix in a sitting. Red means the data can't be trusted for this purpose yet.

Here's the rule that keeps you honest. If everything's green, build, you're ready. If you've got ambers, fix those specific fields first, then build. If anything's red, stop and have a proper think before you spend money.

The trap is treating amber like green because you're keen to get going. Don't. An amber you ignore becomes a customer-facing mistake later, and those cost far more than the half-hour fix would have.

What do you do when something comes up red?

You've got three honest options, and all of them are fine. The mistake is pretending the red isn't there.

First, fix only what this project needs. You don't have to clean your entire business, that's a forever job nobody finishes. You only have to clean the data this one process touches. That's usually a column or two, not the whole database. Narrow scope is the whole point of doing the check.

Second, pick a different first project. If the data for quote-chasing is a mess but your invoice data is tidy, start with invoices. Win where you're already strong, build a bit of confidence, and come back to the messy area later with a track record behind you.

Third, treat the cleanup as the project. Sometimes the most valuable thing AI prompts you to do is finally sort out the data you've been ignoring for years. Tidy records pay you back in every report, every decision and every future automation, whether or not AI ever touches them.

None of these is a failure. A red light that you caught in 30 minutes is a small, cheap, early decision. A red light you find after the build is a refund conversation.

Why does this matter more for AI than for the old way?

Because a person fills the gaps without thinking, and AI doesn't. When your office manager sees "Limited" and "Ltd", they know it's the same firm. When they spot a blank phone field, they pick up the file and find the number. They quietly paper over your AI data quality problems all day long, and you never see them.

Take that human glue away and hand the job to a system, and every crack shows. That's not an argument against AI, it's an argument for checking first. Done properly, AI augments your team by taking the repetitive load off them. But it can only augment what's already sound. Point good automation at bad data and you've just built a faster way to be wrong.

That's also why this sits squarely in good governance, not just tidiness. Knowing what data your AI uses, and trusting it, is the foundation everything else stands on. Skip it and you're guessing. Run the 30-minute check and you're deciding with your eyes open.

If you'd like a hand running this check on your own business, we offer a free AI audit. We'll sit down, look at one real process and the data behind it, and tell you honestly whether it's ready, what to fix first, or where a better starting point might be. No pressure, no jargon, just a clear-eyed look before you commit to anything.

Live with passion & AI,

Brett

Podcast

Host a podcast? Have Brett on as a guest.

Straight talk on implementing AI in real SMEs, no jargon, plenty of receipts from the businesses we run.

Pitch the podcast →

Frequently asked questions

How long does a basic data-quality check really take?

For one process and one or two data sources, about 30 minutes, enough to spot the obvious gaps, duplicates and inconsistencies before you commit to a build.

Do I need a data analyst to check my AI data quality?

No, an owner or manager who knows the business can run the first check by eye; you only need technical help once you've decided the data is worth cleaning and using.

What if my data is too messy to use right now?

That's a useful finding, not a failure, you either fix the few fields that matter for this one job, or pick a different first project where the data is already clean.

Will fixing data quality delay my AI project for months?

Usually not, because you only clean the specific data the first project touches, not your entire business, that scoping is the whole point of the check.

Does better data quality actually change the AI output that much?

Yes, AI repeats whatever patterns it's given, so cleaner inputs mean fewer wrong answers, fewer awkward customer moments and far less time spent correcting it later.

About the author

Brett Alegre-Wood

Brett is a four-time founder (Darra Tyres, Gladfish, EzyTrac, Anaboo) and the operator behind AIOS, Anaboo's AI Operating System. He writes from inside the build, installing AI in his own businesses first and reporting back what actually moves the numbers. Based between Singapore, the UK and Australia.