Chris Landtiser

Clean Messy Data with Copilot in Excel

Turn a chaotic expense tracker into clean, analysis-ready data - using Microsoft 365 Copilot one prompt at a time.

~45 minutes
Intermediate
Excel for the web or desktop
Leave This Lab Your progress
0 of 8
Phase 1 — Assess & Structure
1

Open the File and Survey the Damage

~3 minutes

Open Q1_Project_Expenses_MESSY.xlsx in Excel. Before you touch anything, scroll top to bottom. You're looking for problems — and this file has plenty. Notice the merged title row across the top, the blank rows acting as section dividers between projects, dates in at least three different formats, dollar amounts that are sometimes numbers and sometimes text strings with dollar signs, abbreviations like "Consltng" and "Trvl" mixed in with proper category names, a vague "Col F" header, and a hand-typed total at the bottom that's more of a guess than a number.

Bottom of the messy spreadsheet showing mixed date formats, category abbreviations like Consltng and Trvl, dollar-formatted strings mixed with plain numbers, duplicate rows near the bottom at rows 73 through 75, and a red italic total reading ~$28,000?? in row 77

The bottom of the file — mixed date formats, "Consltng" and "Trvl" abbreviations, the 3 duplicate rows hiding at rows 73–75, and that red italic "~$28,000??" total.

Tip
Resist the urge to start fixing things manually. The point of this lab is to let Copilot do the heavy lifting. Diagnosis before treatment.
2

Remove Structural Clutter

~5 minutes

Click into any cell with data and open Copilot from the ribbon. Your first job is to clear out the structural junk that prevents Excel from treating this as a proper data table. The file has a merged title row at the top, a blank row after it, blank divider rows between project sections, a hand-typed TOTAL (approx) ~$28,000?? at the bottom, and vague headers ("Col F" for the approver column, and a completely blank header on the notes column). Tell Copilot to identify the structural issues and make sure the data starts with a proper header row.

SAMPLE PROMPT This spreadsheet has structural issues that prevent it from being a clean data table. Identify the issues and make sure the data starts with a proper header row.
Excel spreadsheet after Copilot structural cleanup. Row 1 now shows clean headers: Date, Project, Category, Amount, Submitted By, Approved By, Notes. The Copilot panel on the right shows Reasoned in 2 steps and confirms the spreadsheet has been successfully cleaned with a summary of structural issues identified and fixed.

Copilot removes the merged title, blank divider rows, and the hand-typed total — and renames the vague headers in one pass.

Checkpoint
Your header row should now read: Date, Project, Category, Amount, Submitted By, Approved By, Notes. No blank rows above, no blank rows between sections, no total row at the bottom. If that all checks out, the structure is clean.
  • Copilot doesn't detect the merged cell: Unmerge it manually first (Home > Merge & Center to toggle off), then re-prompt Copilot to find the remaining junk rows.
  • "I can't modify data outside a table": If your data isn't formatted as a table yet, select the full data range and press Ctrl + T to convert it, then try again.
  • Headers still say "Col F" or are blank: Copilot sometimes fixes these automatically when you ask for a proper header row. If not, manually rename column F to Approver and column G to Notes.
Phase 2 — Clean & Standardize
3

Standardize Dates

~5 minutes

The Date column is a mess. Three different formats are mixed throughout: short dates like 2/26/26, written-out dates like January 20, 2026, and ISO dates like 2026-01-20. Every one of these is stored as a text string, not an actual Excel date. Ask Copilot to convert them all to proper date values.

SAMPLE PROMPT The Date column has three different text formats mixed together. Convert all of them to proper Excel date values and format the column as YYYY-MM-DD.
Excel spreadsheet after date standardization. The Date column now shows consistent YYYY-MM-DD format throughout. The Copilot panel shows Reasoned in 2 steps and confirms all dates have been successfully converted. The Category and Amount columns remain uncleaned with mixed casing and dollar-formatted strings still visible.

All three date formats unified into proper Excel dates — notice the categories and amounts are still messy. One thing at a time.

Tip
ISO format (YYYY-MM-DD) sorts correctly as text and as dates, and eliminates the month/day confusion between US and international formats. It's the safest default for shared data.
  • Some dates didn't convert: The short format 1/20/26 can be ambiguous. If Copilot misinterprets any, specify in your prompt that all dates are in 2026 and the format is M/D/YY (US convention).
  • Dates show as numbers (like 46041): The conversion worked — those are Excel serial dates. Right-click the column, choose Format Cells, and apply a date format.
  • Copilot created a helper column instead of editing in place: That's fine. Copy the helper column values, paste them over the original Date column using Paste Special > Values, then delete the helper column.
4

Normalize Categories

~5 minutes

The Category column has 19 different values for what should be 5 categories. You'll find full words in mixed case (Travel, TRAVEL, travel), abbreviations (Trvl, Sftwre, Consltng, Ofc Supplies), and everything in between. Tell Copilot exactly which 5 values you want and let it figure out the mappings.

SAMPLE PROMPT The Category column has inconsistent naming. Standardize all values to exactly these 5 categories: "Consulting", "Office Supplies", "Software", "Training", "Travel".
Excel spreadsheet after category normalization. The Category column now shows clean, consistent values like Travel, Consulting, Software, Office Supplies, and Training. The Copilot panel shows Reasoned in 1 step and confirms the Category column has been successfully standardized with a summary. The Amount column still shows mixed formats with some dollar-sign strings visible.

19 variations collapsed into 5 clean categories — Copilot mapped the abbreviations automatically.

Checkpoint
Filter the Category column. You should see exactly 5 values: Consulting, Office Supplies, Software, Training, Travel. If you see anything else — an abbreviation that slipped through, a lingering typo — re-prompt with the specific value that needs mapping.
5

Clean the Amount Column

~5 minutes

The Amount column is a mix of three different data types. Some values are actual numbers (integers like 1550 or floats like 612.5). Others are plain text strings like "67.25". And about a third are dollar-formatted strings like "$3,200.00". Until they're all real numbers, no formula — and no Copilot summary — will give you an accurate total. This one doesn't need a complicated prompt.

SAMPLE PROMPT Clean up the Amount column to simple numeric values.
Excel spreadsheet after amount cleanup. The Amount column now shows right-aligned numeric values like 1,550.00, 1,200.00, 67.25, 612.50 throughout. The Copilot panel shows a green checkmark with Amount Column Cleanup Complete and begins listing Issues Found and Fixed.

All amounts converted to proper numbers — right-aligned, ready for math.

Note
A quick way to verify: select the full Amount column and check the status bar at the bottom of Excel. If it shows a Sum, all values are numeric. If it only shows a Count, some text values are still hiding in the column.
6

Find and Remove Duplicates

~5 minutes

This file has 3 duplicate rows buried near the bottom — the kind you'd never catch on a visual scan. They're exact copies of rows that appear earlier in the data, likely from a copy-paste accident. Ask Copilot to find them.

SAMPLE PROMPT Check this data for duplicate rows. Compare all columns except Notes to find exact matches. Show me which rows are duplicates and which rows they duplicate.
Excel spreadsheet with the Copilot panel showing a Duplicate Analysis with a green checkmark. Copilot confirms it found duplicate rows in the data and will display the information. The spreadsheet in the background shows clean headers and standardized values across all columns.

Three duplicates identified — each an exact match of an earlier row.

Advanced
In real-world data, duplicates aren't always exact. Two rows with the same date, project, and amount but different submitters could be a legitimate split expense or a copy-paste error. Always review flagged duplicates before deleting — Copilot finds them, but you make the call.
Phase 3 — Validate & Summarize
7

Build a Summary

~5 minutes

Your data is clean. Now make it useful. Ask Copilot to create a summary breakdown by project and category so you can see where the money actually went. This is also your chance to see how far off that hand-typed ~$28,000?? estimate really was.

SAMPLE PROMPT Create a summary table showing total spending by Project and Category. Include row totals for each project and a grand total. Also tell me: what is the actual total, and how far off was the original "~$28,000" estimate?
Copilot-generated summary table in columns I through O showing total spending by project and category. Atlas Redesign: $9,710. Beacon Migration: $8,095.74. Catalyst Launch: $9,882.98. Delta Compliance: $15,935.24. Grand Total row highlighted in dark blue: $43,623.96. Below the table, an Estimate vs. Actual Analysis shows Original Estimate ~$28,000, Actual Total $43,623.96, Difference $15,623.96, Variance 55.80%. The Copilot panel on the right shows the matching prompt.

The real numbers — $43,623.96 total, a 55.80% variance from the original "~$28,000" guess.

Checkpoint
You should have a clean dataset with 66 rows (after removing the 3 duplicates from the original 69), 7 properly labeled columns, consistent dates, 5 normalized categories, all-numeric amounts, and a summary that tells you exactly what was spent where.
The Full Send
8

One-Prompt Clean

~10 minutes

Open Q1_Project_Expenses_FULLSEND.xlsx — a fresh, untouched copy of the original messy file. This time, write one comprehensive prompt that handles everything you just did across Steps 2–7 in a single shot. You've already learned what the problems are and how Copilot solves each one. Now combine that knowledge into one clear instruction.

SAMPLE PROMPT Clean this expense spreadsheet completely:

1. Remove the merged title row, the blank row after it, all blank divider rows between sections, and the hand-typed TOTAL row at the bottom. Make sure the data starts with a proper header row.
2. Standardize all dates to YYYY-MM-DD format. The column has three text formats mixed together: M/D/YY, Month Day Year, and YYYY-MM-DD. Convert all to real Excel dates.
3. Normalize categories to exactly 5 values: Consulting, Office Supplies, Software, Training, Travel.
4. Convert all amounts to proper numbers by stripping dollar signs and commas from text strings.
5. Find and remove exact duplicate rows (comparing all columns except Notes).
6. Create a summary table showing total spending by Project and Category with a grand total.
Tip
The step-by-step approach from Phases 1 and 2 isn't wasted effort — it's how you learn what to ask for. The full-send prompt works because you already know the problems and their solutions. Specificity comes from experience.
  • Copilot only completes part of the list: Break it into two prompts — structural cleanup first (items 1-2), then data cleaning (items 3-6). Some tasks have dependencies that work better in sequence.
  • Results differ from your step-by-step version: Compare the two side by side. Minor differences in rounding or date formatting are expected. If a category mapping was missed, add a follow-up prompt for the specific value.
  • Copilot times out or throws an error: Long prompts occasionally hit processing limits. Try removing the summary request (item 6) and running it as a separate follow-up prompt after the cleanup completes.

Lab Complete!

You turned a messy expense tracker into clean, analysis-ready data - and discovered the real total was over $15,000 higher than the guess at the bottom. Then you did it again in one prompt. That's the Copilot workflow: learn the problems step by step, then automate the solution.