Skip to main content

The High Cost of Dirty Data and Its Consequences for CPOs and Their Teams

1 April, 2020 | By Susan Walsh

We all know what dirty data is, but it can be defined very differently depending on who you speak to.  At it’s most basic level, dirty data is anything incorrect.  In detail, it could be misspelt vendors, incorrect Invoice descriptions, missing product codes, lack of standard units of measure (e.g. ltr, l, litres), currency issues, duplicate invoices or incorrect/partially classified data.

Dirty data can affect everything and everyone from the bottom to the very top of an organisation, we all have an impact on, and responsibility for the data we work with.  How many times have you noticed a small error and not said anything or just manually corrected something from an automated report, just get it out the door on time?  It’s too much hassle to find the right person to notify, or you raise a ticket but never get round to checking if it’s resolved. 

These small errors can filter all the way up to the top of an organisation where critical decisions are being made.  It happens almost every day.  

So, what are the consequences?

Reporting and decision making.

Dirty data can make a significant impact in these areas.  Imagine receiving a dashboard from your team, which might be used for cost savings, supplier negotiations, rationalisation or forecasting.  What if I then told you that there was £25k of cleaning spend under IBM?  I can already hear you saying “that would never happen, it’s so obvious”.  Well, it is obvious but I have worked on a data file with IBM classified as cleaning, it happens a lot more than you think.

In that very dashboard that you are using to make decisions, you’ll see increased spend in your cleaning category, and a decrease in your IT spend, which could affect discounts with your supplier, your forecast for the year, monitoring contract compliance with suppliers etc…  It might also affect reporting of inventory,  you think you need more laptops, and make unnecessary purchases. 

And if you have tens or hundreds of thousands of rows of data, this will occur multiple times within your data.  And in the wider organisation, this could affect demand planning, sales, marketing and financial decisions.

Technology implementation.

An area that’s often neglected is data preparation or cleansing before the implementation of any new software or systems.  By the time it’s discovered there are errors in the data, staff have lost faith in using the software and are disengaged, claiming it doesn’t work, or they don’t trust it because “it’s wrong”.   

At this point, it either costs a lot of money to fix and you hope staff will engage again, or the project is abandoned.  In either case, this can take months and cost tens of thousands of pounds/euros/dollars in abandoned software or reparation work.

AI and Automation.

This can potentially cause lots of problems, as with technology implementation.  It’s critical that the data is cleansed and prepared before being used for any type of AI or automation.  Think back to the IBM example, each quarter the data is refreshed automatically with the cleaning classification, that £25k becomes £50k, then £75k the following quarter, it’s only when the value becomes significant that someone notices the issue.  By this time how many decisions have been based on this incorrect information?

How can this be fixed?

There is no quick fix, magic button, or software that can resolve these issues.  The first step is to improve data accuracy, and this has to be done by humans.  Get everyone at every level to engage and take responsibility for the organisation’s data. 

Not an easy task, but if your team understands the impact of the data they work on within the organisation, and that it’s not just the responsibility of “Bob in the corner” or “The IT department” it can make a huge difference.

Consistency is also extremely important.  Define rules and processes, classification is very subjective and quite often there’s more than one right answer.  As long as everyone’s working to the same standards, it’s much easier to change if it’s wrong later on.

Maintain your data.  If it’s not maintained it will slowly become unusable over time. Either monthly or quarterly depending on the volume is recommended to keep on top of any issues, otherwise you’ll have to pay a large sum to fix the same issues all over again.

Spot check your data regularly.

How to spot check your data.

Checking within Excel is not easy, but these instructions can be followed by anyone, regardless of skill level and will give you the basic tools needed to check any classified data set, whether it’s been classified by your team or a 3rd party supplier.

  1. Select the data and create a pivot table.  Choose Supplier Name or Normalised if available, and the levels of classification.
  • Change the report layout to show in tabular form, this will list by supplier, by classification.  From this you’ll be able to pick out any lines that stand out.
  • If you have a supplier with a large number of rows, you can view it separately by copying the data into a new tab and creating a new pivot table from that.

Try it for yourself.

I’ve used clear, simple examples in this blog, but you will have many more subtle errors within your own data.  Why not test this out?  Using the tools above, spot check your classified data to check it’s accuracy. 

And if it’s not right?  Do something now.  And if your teams are busy, use a reliable 3rd party, it’s that important.

Data accuracy is an investment, not a cost.  Address the issues at the beginning – while it might seem like a costly exercise, you will undoubtedly spend less than if you have a to resolve an issue further down the line with a time-consuming and costly data clean-up operation.

Susan Walsh is the founder of The Classification Guru

Join us at #DPWConference

Attend the world’s flagship event of the digital procurement industry. Digital procurement transformation is happening now.

Watch DPW2021 Aftermovie