Will the true data scientist please stand up?

28 July 2016

Jan van Ansem

Jan van Ansem

Principal consultant SAP Data Warehousing

The internet has become a font of knowledge for most people, but this has not transformed each and every one of us into geniuses. Neither does providing specialist data discovery tools to the business turn all of us in data scientists. Why organisations think it is a good idea to provide a wide range of business users with state-of-the-art platforms for data discovery is beyond me, but it is happening. I expect that businesses fail to distinguish between ‘data discovery’ in its true sense, and ‘self-service analytics’. This may have catastrophic consequences. Luckily there is a way to keep business users, data experts and the IT department happy. 

How does ‘data discovery’ differ from ‘self-service analytics?

Data discovery and self-service analytics are both about combining data sources, identifying unexpected patterns, and gaining new insights. This covers a wide variety of skills:

Self-service analytics are business users drilling down to specific cross sections of data from their enterprise systems. They may combine subsets with data from other sources. This gives them a better understanding of what is happening in their business, and supports the decision making process.

Data discovery is done by highly skilled data professionals who have vast amounts of data at their disposal, from the enterprise systems and other sources. They utilise reference data from market research institutions, data from various applications and data from online resources. By the use of complex algorithms and visual tools, they identify new relationships and are able to ascertain how reliable certain correlations are.

Both data discovery and self-service analytics are important to the business and need to be supported by central IT teams. However, they need to operate according to different rules, here’s why.

Drowning in the data lake

Jan-will-the-real-data-scientist-stand-up-content-picture.jpg

A colleague at work recently said: “Giving ‘everyone’ access to ‘all’ data without further guidance mirrors the effect Spotify has on people who want to play music: They don’t know where to start or what to do with so much choice”.The problem is that whilst making the wrong choice of music can be annoying it rarely has serious consequences. Using data the wrong way is a recipe for disaster.

On the internet people can find ‘hard evidence’ for any hypothesis. This is also true when untrained users get access to corporate datasets without rules and guidance. Patterns which don’t match with what someone wants to see are easily filtered out. Does a micro-level survey show an interesting correlation? Just zoom in and draw a graph without considering the further context. The result? The world is flat, the moon landing was filmed in a studio and sales of anti-depressants increases on days when England plays football. It is all written on the internet, or in this case the enterprise information systems, so it must be true.

Even when people are not deliberately manipulating data, things go horribly wrong. One of our clients has described its entire business with two or three complex ‘views’. Through these views, all business users had unfiltered access to virtually all enterprise data. The views were impractical to use (too many rows and columns) so users created their own views on top of these views. Now, there are tens of thousands of views, some of them showing reliable information, many of them producing incorrect results. Many of the users, after finding conflicting results in their reports, have now stopped using the system.

Supporting your users in a sustainable way

Data specialists and self-service analytics users are different user groups with different requirements. Here are four of the key areas where the requirements differ:

1. Tool selection

Below is a subset of the selection criteria used for Business Intelligence (BI) tool selection. By talking these through with the different stakeholders it will become evident that what is a key requirement for data discovery might be just a ‘nice-to-have’ for self-service analytics and vice versa.

jan-2016-07-28_11-38-55.jpg

2. Data access and data security

Data specialists require access to all data. They are trained to use the data responsibly and will test the validity of their outcomes. 

Business users should have a governed analytical dataset as a starting position, with the flexibility to combine this data with ad hoc sources. 

3. Change management

Data specialists typically don’t drive the actual (IT) change process. They influence the organisation top-down, with changes in information systems following the traditional change management route. 

When business users come to new insights, they want to change their standard reports and implement new reports as soon as possible. This is where IT needs to take a strong position. The impact of a change on the entire BI infrastructure and user group should be considered. There might be requirements to change static PDF/Print reports, mobile reports and KPI dashboards. Communication needs to happen so different users don’t have to invent the wheel again. Data discovery should be separated from enterprise BI, and there should be no shortcuts to quickly promote data discovery reports to a standard report.

4. Support

Data specialists are largely self-sufficient. This doesn’t mean the data specialists should be ignored by IT – in general any IT and especially BI centric change processes will benefit from data specialists involvement.

Business users need a little more looking after. They benefit from education and guidance about the data in the system. Business users will also benefit from help with ad-hoc data provisioning.

Conclusion

Self-service analytics and data discovery are here to stay. I’ve provided some considerations as to how Central IT can facilitate these processes in a cost effective, supporting and responsible way. I hope this will help to make an ongoing success of self-service analytics and data discovery in your organisation. At least, I hope you now know how to identify the true data scientist in your user community.

 

View comments

Comments

Blog post currently doesn't have any comments.

Bluefin and SAP S/4HANA - welcome to the one horse race