How to Clean Messy Data in Excel: A 9-Step Checklist
Data exported from a CRM, a web form, or someone else’s “system” almost never lands in Excel ready to use. The mistakes people make aren’t usually about knowing the right function — they’re about doing the steps in the wrong order, like removing duplicates before trimming spaces, which leaves “Smith” and “Smith ” sitting in your data as two different people. This checklist puts the nine cleanup steps in the order that actually works, with the exact tools for each one.
Before You Start: Two Ground Rules
Work on a copy. Right-click the sheet tab, choose Move or Copy, and tick Create a copy. Cleanup steps like Remove Duplicates and Find & Replace are destructive, and once you’ve saved over the original, there’s no getting it back.
Convert your range to a table with Ctrl + T. Tables auto-expand formulas, keep filters attached, and make it obvious where your data ends — which matters in step 1.
Step 1: Delete Blank Rows and Junk Columns
Imported files love to arrive with empty separator rows, repeated header rows, and a stray “Page 1 of 4” column from whatever report generated them. Clear these out first, because blank rows break sorting, filtering, and Ctrl + Shift + Down selection.
The fast method: select your data, press F5, click Special, choose Blanks, then right-click any highlighted cell and pick Delete > Entire Row. There are a few gotchas with partially blank rows, covered in our guide to deleting blank rows in Excel. While you’re at it, delete unused columns entirely rather than just hiding them.
Step 2: Trim Spaces and Strip Invisible Characters
This step comes before duplicate removal for a reason: "Acme Corp" and "Acme Corp " are different values to Excel, so trailing spaces hide duplicates and break VLOOKUP matches.
In a helper column, use:
=TRIM(A2)
The TRIM function removes leading and trailing spaces and collapses internal runs of spaces down to one. For data copied from web pages or databases, also wrap it with CLEAN, which strips non-printing characters like line feeds:
=TRIM(CLEAN(A2))
One trap TRIM won’t fix: the non-breaking space (character 160) that web pages use. Handle it with SUBSTITUTE first:
=TRIM(SUBSTITUTE(A2, CHAR(160), " "))
When the helper column looks right, copy it and paste back over the original with Paste Special > Values, then delete the helper.
Step 3: Remove Duplicates
Now that spacing is consistent, duplicates are actually findable. Before deleting anything, I like to see what I’m about to lose — highlight duplicates with conditional formatting (Home > Conditional Formatting > Highlight Cells Rules > Duplicate Values) and eyeball a few.
Then select the data and go to Data > Remove Duplicates. The dialog lets you choose which columns define a “duplicate” — checking only Email, for example, treats two rows with the same email but different names as dupes. Excel keeps the first occurrence and deletes the rest; if which row survives matters, sort first so the row you want to keep is on top. Full details in removing duplicates while keeping the first instance.
Step 4: Fix Dates Stored as Text
Text-dates are the sneakiest problem on this list because they look fine. The tell: real dates right-align by default and respond to number formatting; text-dates left-align and ignore it. If =ISNUMBER(A2) returns FALSE on a date cell, it’s text.
Three fixes, in order of preference:
| Situation | Fix |
|---|---|
| Consistent format Excel recognizes (e.g. “3/14/2026”) | Text to Columns trick: select the column, Data > Text to Columns > Finish (yes, just Finish — it re-parses every cell) |
| Recognizable but text-typed dates | =DATEVALUE(A2), then format the result as a date |
Odd formats like 20260314 | Rebuild with =DATE(LEFT(A2,4), MID(A2,5,2), RIGHT(A2,2)) |
We walk through each scenario in converting text to dates in Excel. Watch out for region mismatches — a file exported with DD/MM dates opened on an MM/DD system will silently swap days and months wherever both numbers are 12 or under. Text to Columns lets you declare the source format (choose DMY in step 3 of the wizard), which is the safe way to handle imports from other locales.
Step 5: Split Combined Columns
One value per column is the rule. “Jane Smith” in a single Name column, or “Chicago, IL 60601” in one Address field, blocks sorting by last name and filtering by state.
Two tools do this job:
- Text to Columns (Data > Text to Columns): pick Delimited, choose your separator (space, comma), done. Best for clean, consistent delimiters — see splitting names in Excel for the walkthrough, including the middle-name problem.
- Flash Fill (
Ctrl + E): type the first result by hand in the next column — “Jane” — then pressCtrl + Eand Excel infers the pattern for the rest. Flash Fill handles messy, inconsistent patterns that delimiters can’t, but always spot-check its output; it guesses, and it guesses silently.
Insert enough empty columns to the right first, or Text to Columns will overwrite whatever’s there without much warning.
Step 6: Find & Replace the Recurring Garbage
Every messy dataset has its signature junk: “N/A” strings, double hyphens, “(none)”, inconsistent abbreviations like “St.” vs “Street”. Press Ctrl + H and knock them out in bulk.
Three power moves worth knowing from our Find and Replace guide:
- Replace with nothing — leave the Replace box empty to delete a string everywhere.
- Match entire cell contents — under Options, this stops “N/A” replacements from mangling cells that legitimately contain those letters, like “DNA/AB”.
- Wildcards —
*matches any run of characters, so finding(*)and replacing with nothing strips every parenthetical. To find a literal asterisk, search for~*.
Limit the scope by selecting a range first; otherwise Replace All hits the whole sheet.
Step 7: Convert Numbers Stored as Text
Same disease as step 4, different organ. Text-numbers left-align, often show a green error triangle, and — the real damage — silently return 0 in SUM and fail to match in lookups.
The quickest fix for a visible batch: select the cells, click the warning icon, choose Convert to Number. For formula-based conversion, the VALUE function does it explicitly: =VALUE(A2). A classic trick that also works at scale: copy an empty cell, select the text-numbers, then Paste Special > Add — the math operation forces conversion in place.
One caution: ZIP codes, phone numbers, and product IDs with leading zeros should stay text. Converting 02134 to a number turns it into 2134. Decide column by column, not wholesale.
Step 8: Handle Blanks and Standardize Case
Blanks first. Decide what an empty cell means — unknown, zero, or “same as the row above” (common in report exports where the category only appears once per group). For the fill-down case: select the column, F5 > Special > Blanks, type = plus the up-arrow key, and press Ctrl + Enter to fill every blank with the value above it. Full steps in filling blank cells with a value.
Then case. “JOHN SMITH”, “john smith”, and “John Smith” should not coexist in one column. Excel has no Change Case button like Word — you use functions in a helper column:
| Function | Result |
|---|---|
=PROPER(A2) | John Smith |
=UPPER(A2) | JOHN SMITH |
=LOWER(A2) | john smith |
PROPER is right 95% of the time; it stumbles on “McDonald” (→ Mcdonald) and “IBM” (→ Ibm), so scan for those by eye. Paste values back over the originals as in step 2.
Step 9: Validate — and Keep It Clean
Cleaning is pointless if next week’s data entry re-pollutes the file. Two closing moves:
- Audit what’s there. Sort each key column and scan the top and bottom — typos and outliers (“$1,200,000” in a column of hundreds, a birthdate in 2031) cluster at the extremes. A quick
=COUNTIFper category against the expected list catches near-duplicate labels like “NY” vs “New York”. - Lock the door. Add data validation rules (Data > Data Validation) so future entries are constrained: dropdown lists for categories, date ranges for dates, whole-number limits for quantities. Then run Data > Data Validation > Circle Invalid Data once to flag existing cells that break the new rules.
The Checklist, Condensed
| # | Step | Primary tool |
|---|---|---|
| 1 | Delete blank rows / junk columns | F5 > Special > Blanks |
| 2 | Trim spaces, strip characters | TRIM, CLEAN, SUBSTITUTE |
| 3 | Remove duplicates | Data > Remove Duplicates |
| 4 | Fix text-dates | Text to Columns, DATEVALUE |
| 5 | Split combined columns | Text to Columns, Flash Fill |
| 6 | Bulk-fix recurring junk | Ctrl + H |
| 7 | Convert text-numbers | Convert to Number, VALUE |
| 8 | Fill blanks, standardize case | F5 trick, PROPER/UPPER/LOWER |
| 9 | Validate and protect | Data Validation |
Run it top to bottom and you’ll rarely have to circle back. If you clean the same export every week, it’s worth learning Power Query (Data > Get & Transform) — it records these steps once and replays them on every refresh.
FAQ
Why did Remove Duplicates miss obvious duplicates?
Almost always trailing spaces, non-breaking spaces, or case differences — “Acme ” and “acme” are distinct values to Excel. Run TRIM and a case function (step 2 and step 8) before Remove Duplicates, then run it again.
My dates look like dates but SORT and filters treat them wrong. Why?
They’re text. Test with =ISNUMBER(A2) — FALSE means text. Select the column and run Data > Text to Columns > Finish to force Excel to re-parse them, or use DATEVALUE. If days and months seem swapped, the file came from a different date locale; declare the source format (DMY/MDY) in step 3 of the Text to Columns wizard.
Should I clean data with formulas or do it in place?
Formulas (in a helper column) for anything you might need to re-check or undo — TRIM, PROPER, VALUE — then Paste Special > Values over the original. In-place tools (Remove Duplicates, Find & Replace, Text to Columns) are faster but destructive, which is why you work on a copy of the sheet.
Is there a way to automate this whole checklist?
Yes — Power Query. Import the file via Data > Get & Transform, apply trim, type changes, split, and dedupe steps once, and they’re recorded as a repeatable script. Every future export cleans itself with a single Refresh. For one-off files, the manual checklist is still faster.