The problem with the quality of your data

19 April 2012

Jan van Ansem

Jan van Ansem

Principal consultant SAP Data Warehousing

Yes, there is a problem with your data. And I think you are aware of it. It is nothing to be ashamed of, it happens to the best. The question is what are you going to do about it? Are you sticking your head in the sand and hope the problem will disappear? (It won't). Or are you dealing with the quality when the business starts complaining (too late, poor data quality is already costing you money)?

An organisation should understand to what risk it is exposed because of data inaccuracies. Once the risk is understand you can start to pro-actively ensure business users have reliable data for decision making in business critical processes.

This blog will give you some ideas about how to assess the risk to your business. I will then explain scenario's how you can get 'in control' of the quality of your data. But first, let's have a look at why your data is incorrect in the first place.

How do you end up with incorrect data?

First of all, your data might have been right at some point. For example, when it was entered, it was a true account of an actual entity or event. Then something happened (a customer moved house, an internal re-organisation takes place, your company's product catalogue is revised) and suddenly your data is out of date. Aging, at least in the context of Data Management, is a problem.

Another reason for having incorrect data is that the data was migrated from a previous system. And although the new system has all bells and whistles and safety guards to prevent human errors, cleaning up the old system has always been deemed 'too expensive'.

Finally, let's also look at the 'human error'. People create wrong data. Even the best checks in the system will not stop users from entering incorrect data. They don't do it on purpose, honestly they don't. But it does happen. So here you are; three reasons for incorrect data. Two of these root causes are events taking place on a day to day basis. This means that a one-off data cleansing activity will perhaps momentarily improve your data quality, but in the long term you can only guarantee an acceptable quality of data if you regularly check and cleanse your data.

How to measure 'good data'?

To measure the quality of your data you should look at how data is used in your business processes. A notorious cause for concern is customer address data. Obviously it is a problem if you have duplicate customers, incomplete addresses and invalid contact details for your customers. But to complete an order you will need more than that. If your address data is spotless but fulfilment struggles to complete an order because of incorrect product information then you will end up with an unhappy customer. So look at processes rather than tables. Define what information you need to complete a business critical process and set up business rules for the data used in this process. Where is it unacceptable to have duplicates, which fields have to be right at all costs and which are just 'nice to have' on a master data record? Once you have defined the rules, you can check how your data complies with the rules, and find out if 80% complies, or 99%.

Return on investment

Defining the targets has everything to do with risk assessment: How much is it going to cost if a process is delayed or goes wrong because of bad data. And how likely is it that such event can occur? How much is it going to cost to decrease the likeliness of such event to happen? In other words, do the costs for having to deal with the consequences of bad data exceed the costs of cleaning up the data in your system? If your data is only 80% right, then it is quite likely that implementing a data quality policy is going to save money. On the other hand, getting the last 1% spot on might be very expensive and the risks related to this data inaccuracy might not outweigh the costs.

Where to start and what to aim for

As explained above, if you want to talk about the quality of your data you have to have the ability to measure the compliance of your data with defined data quality rules. This should not be a one-off exercise, but a repeatable process so changes in the data quality can be monitored over time. To do this, you need to have your rules defined in a system and be able to automatically check your dataset against the defined rules.

Below is a 'growth model' for data quality processes.


Ability to be pro-active

On the vertical axis you find the 'response time' which is the time which it takes from the time when incorrect data is created until the time when it is adequately dealt with. Many organisations still only act reactively when a problem arises, and deal with incidents on a case by case basis. Moving up one level is where you know the rules, but checking the data is a project in itself. An improvement from there is when you have the ability to regularly assess your data quality. Ideally this is done by business users and not IT, where your business users are also empowered to remedy the data. This is the minimum level to aim for, when you are thinking of improving the data quality in your organisation. In my opinion, this is also a prerequisite before you can move on to the even more mature processes, where you are able to capture a single image of the truth in a central repository (and are able to have other systems leveraging from this information) or even to a situation where there are workflows in place which allow users from different places in the organisation to collaboratively create a new master data entry.

A strategy for all data

The vertical axis represents the scope - this is more or less equal to the number of entities for which you have defined data quality processes. I would like to re-iterate that it is important to use a holistic approach to data quality: look at processes, not at database tables. At the same time, it is impossible to deal with all the data problems in your system at the same time. Implement a framework which allows you to increase the scope over time, but start with a small number of entities, or even a small sub-selection of an entity, for example customer data for a specific country instead of the customer data of your complete, global, customer database.

Infrastructure and tools

Once you have some ideas about where to start, who to involve, and what the ideal future will look like, you will start looking for the tools to help you support your data quality processes. In recent years, SAP has put a lot of emphasis on data quality management, and now provides a complete set of tools to support data quality processes. In fact there is now such a large number of tools available that many find it confusing to understand which tools should be used in what context. Especially since some of the functionality of the different tools is actually overlapping. To help you understand which tools are best to use in which processes, I have overlaid the previous diagram with the 'maturity' of DQ processes with tools typically used for those processes.


Note that for some of the tools there is a dependency: You have to run Data Quality and Information Steward run on the Data Services 'platform'. Other tools, like Master Data Management, can be implemented without having Data Services. In my opinion, there is usually not much point in implementing Master Data Management or Master Data Governance, without having the tools to get the basics right first.

Now is a good time to start

Well, it is never a good time to start, is it really? But your data quality issues will only become bigger problems if you don't start dealing with them. By keeping a few things in mind your project will succeed and you have happier business users, happier customers and a happier you. So one more time the key points for consideration when you start a data quality project:

  • Look at processes, not system tables
  • Involve and empower the business users, they are responsible for data quality
  • Start with a long term vision (and a landscape design to support this) but
  • Start with a small scope

Good luck with your project!

View comments


Blog post currently doesn't have any comments.

Bluefin and SAP S/4HANA - welcome to the one horse race

We use cookies to provide you with the best browsing experience. By continuing to use this site you agree to our use of cookies.