Insights

Gary Elliott

BI Consultant, Bluefin Solutions

Data Warehousing theory… Is a rethink required?

29 Mar 2011 Business Intelligence (BI)

For as long as I’ve been working in Business Intelligence and Warehousing, there has really been only 2 trains of thought on how to approach a Data Warehouse; Kimball or Inmon. At a high level, the key differences in approach are; Kimball proposes that we build from the ground up, and Inmon advocates a top down approach. That’s clearly a generalised statement of the differences, and not a statement designed to inspire debate. I’ve seen both approaches in action and both have their pluses and minuses. The topic of debate here is not focused on their differences, but whether the theory of both approaches are still valid with respect to current advances in data warehousing.

The core of their design is to provide an efficient method of data storage and retrieval. At the time of design, memory and storage were both expensive, which led to the use of Data Marts and aggregated data as a method of minimising the amount of data that is stored within a data warehouse.

This is the key problem with Data Marts; they are design for aggregated data. The vast majority of users that I have spoken to during requirements gathering exercises respond with an all too familiar statement when asked what their requirements are; “we want to report on everything” (breadth). When asked how much of everything they would like to hold; the next response is “everything of everything” (depth). From a user perspective, that’s a fair enough statement. Why shouldn’t they be allowed to report on everything to make better business decisions.

To accommodate the requirement of breadth, star schema’s evolve into snowflake scheme’s and multiple Data Mart’s are created. To store the level of depth required for reporting, Data Marts would hold line item data resulting in massive fact tables and performance problems, or the data would be stored in the ODS with clever ways devised to drill from a data mart into an ODS.

Although not perfect, the theory has held well considering the barriers faced. The barriers that led to the design of data warehousing are rapidly falling down. Data Storage and Memory are rapidly decreasing in value and has been for a long time. What’s more relevant is the maturity of new models and approaches to business intelligence and warehousing. To name a few, in recent times, we have seen the rise and maturity of:

These new models and approaches do not clearly fit into the theory of data warehousing as we know it. My question is, should the theory of Kimball and Inmon be updated for the modern advances in data warehousing or do we need a fresh new approach to data warehousing?




Comments

Dan Keeley 06 Sep 2011

Good article - exactly what I was looking for. Kimball is certainly looking old, but perhaps the new columnar databases are it's saviour? Columnar databases seemed to be built precisely for Kimball style data..

But with "Big data" as well (i.e. hadoop etc) and data behind webservices, perhaps the underlying architecture no longer matters, and what will become more important will be some sort of standardised metadata language to access the data regardless of the underlying platform/methodology/implementation?

Geoff Warriss 31 Mar 2011

Hi Gary,

Great read and good thinking. Its strange as HANA/in-memory seems to be set to solve all our 'problems' around data warehousing insecurities, but then real time has been achievable for many years. Sybase have been driving the financial sector and a key use within Global Markets, where real time really is real time!
I discussed this with one of my friends who works for a financial institution within trading, he said Sybase was an interesting acquisition by SAP, but also commented that a data-warehouse will always be needed and the principles of Inmon & Kimball still be adhered to as order is and always will be required, irrespective of in memory. I agree.

Thanks Geoff.

Emma Moss 30 Mar 2011

HI Gary

This is a great article and very, very NOW!

Traditionally the art of BI has been datawarehousing - modelling to achieve efficiency, aggregation, minimise use of space etc. But with the advent of tools such as HANA, many of these concerns are addressed in one fell appliance!!

HANA uses in memory technology and has been designed to allow "real, real time". This is a huge paradigm shift for the skills required to deliver BI. There is going to be increasingly less emphasis on modelling on more on the skills required to define and FIND that needle in the haystack...!

An interesting couple of years to come for the world of BI!!!

Add a comment