There comes a time when organisations realize that they’ve reached full capacity in their Enterprise Data Warehouse. Investing in new storage can be costly, especially if you’re running an In-memory database. But does all your data really need to be held In-memory?
When looking at actual data and reporting usage, you’ll find that some data is infrequently used. An example is cost and profitability analysis, where many organisations keep between 5-10 years of history. Typical standard reporting only uses 2-3 years of history, so two thirds of the data is only used in special use cases for deep analysis. Moving this history out of memory and onto disk storage would immediately reduce the cost of ownership of running an Enterprise Data Warehouse.
Nowadays modern databases offer ‘multi temperature data management’ solutions, allowing you to choose whether data is held In-memory or on disk. Implementing multi-temperature data management comes at a cost: The solution has to be configured, monitored and maintained; some solutions require additional hardware and/or software licenses and users pay the price of not having the same performance for all their reports. Whether or not multi-temperature data management will actually reduce the TCO of running an Enterprise Data Warehouse will depend on factors such as size, the volume of historical data versus current data and the specific requirements to report on historical data.
In this post I describe the four steps to go through when implementing a multi temperature strategy, using SAP Business Warehouse on HANA (BWoH) as the enterprise data warehouse platform.
Step 1: Housekeeping and optimisation
This is just a check, really, as you should be doing this already. It doesn’t have anything to do with multi-temperature data management or with SAP HANA – but if you don’t do this it is probably the most cost-effective thing to do and has to be done before anything else. Clean up your PSA and old log tables. The basis jobs should run at least monthly and they will clean up a lot of junk. Especially when you run HANA, these tables sit in the ‘row store’, which is In- memory, and are extremely wasteful.
Optimise the EDW core data layer
I assume you have migrated from a traditional (disk based) database to SAP BWoH. If the only thing you have done so far is the standard migration with a simple cube conversion then you will probably have a lot of redundant data in your system. Prior to BWoH this was required for getting an acceptable performance, but this should not be necessary anymore.
Are you still using cubes which are loaded from a DSO without further transformations? Get rid of them (use the SAP migration tool to identify these).
Do you have complex transformations to calculate values and load them back into a DSO or Cube, using staging objects in between? There is a good chance that this can be simplified using Analytical models in HANA. By doing so, you not only reduce the database footprint, you also cut back on the overnight load schedule, which also helps to reduce the cost of ownership.
Oh, and whilst you’re at it, get rid of Infosets. Although it will not free up any space, the alternative model in HANA will fly compared to your query performance on Infosets.
Below is a story from one of my clients in the Services industry, who has just gone through a redesign of the company’s opportunities reporting:
“One of the challenges our business was facing around opportunities reporting was the long lead time between changing business requirements and implementing changes in reporting. The idea was to move our opportunities reporting from BW 7.3 to BWoH and to use a 3rd party tool for flexible reporting in addition to SAP Business Explorer web reporting. BWoH was already in use for ERP reporting and the database was reaching capacity. By optimizing the architecture for BWoH we managed to reduce the number of cubes required for opportunities reporting from nine to three and reduce the time required for loading by one hour. This saving of 25% (2 GB) of database space was achieved with minimal effort. The extra design and development work were easily absorbed in the wider project. But the real benefit for the business is the simplification of the solution, making the system more agile and reducing the time to implement future changes. We are now keen to investigate potential for further simplification across ERP and CRM reporting, as it seems that with relatively little effort real savings can be achieved in reducing the database size, shortening the batch window and making the solution as a whole simpler.”
Once you have the basics rights, you can get on with looking at some of the exciting features BWoH has to offer to reduce the in-memory footprint. Before we look at the technical details, let’s have a look at the functional aspects of your data to see what does need be kept in-memory and which data could be stored on cheaper storage.
Step 2: Functional assessment of the data temperature
This is the tricky bit. It is about really understanding which data is not often used and which is still regularly used. If you get this wrong, you end up with disappointed users, having to wait for cold data to appear when running reports as part of their daily or weekly tasks. It is critical to involve business representatives in this process, as well as people with a deep understanding of the data model.
In the past it was common practise to physically separate data for different years in different DSO’s or Cubes. Most BW systems had different cubes for each year, or for example data older than three years in one cube and recent data in another cube. With BWoH it is no longer necessary to do this as SAP HANA is perfectly capable of reporting very fast over very large datasets, so current data and historical data can sit together in one DSO. If the historical data is truly cold then it is possible to move this to the cold store in automated processes based on date selection, without having to create different objects every year.
You have to be sure about what you classify as cold. A cold store has limited throughput compared to in-memory data so it would not cope well with very frequent requests for data. If the ‘cold’ data is still required for regular load processes (lookups) or for regular reporting, the data shouldn’t be marked as cold. It might look like stating the obvious, but in reality it is more difficult than you might expect on the surface of it, especially if you don’t have good visibility into report and data usage already.
An example where one of our customers had misjudged the temperature of data is pricing conditions: These were still updated long after they believed they were cold. In this case the business process had to be changed in order to push more data to the cold store.
Cold data in BWoH is managed using the Near Line Storage concept, which is explained further down in this article.
Warm data is best described as data which is not frequently used for load processes but when it is requested it has to perform really well. BWoH has two mechanisms for managing warm data:
- Active/Inactive data concept - typically used for data staging objects, not for reporting objects
- Dynamic tiering - typically used for very large datasets which are too expensive to run in-memory
Both concepts are explained in more detail further down.
By default, this is everything else, so anything which is not explicitly marked as cold or warm.
Step 3: Decide strategy
To decide which technologies for multi-temperatured data management are the most (cost) effective you have to know the following:
- The size of your database and growth forecast
- The proportion of data in your database which could be held in cold store and the proportion which could be held in warm store (now and in the future)
- The impact of the various multi-temperatured data management technologies on in-memory sizing requirements (and the reduction of costs as a result)
- The costs associated with implementing and running multi-temperatured data management in BWoH.
To help you with the last two points I have summarized the three different technologies, explain how they work, what the impact is on hardware configuration and software licenses and what typical use cases are.
Near-Line Storage (NLS)
The impact on sizing can relatively easily be calculated but it is harder to predict the impact on performance when using any of the multi-temperatured data management technologies. For Active/Inactive data and Dynamic Tiering I can only go by the numbers SAP provides: The overhead in execution time is respectively 5-10% (Active / Inactive) and 2-5% (Dynamic Tiering). For NLS I can draw on some results from some of our clients which have implemented this. What has become clear is that the performance derogation is much dependent on the complexity. For queries without complex OLAP
calculations, the speed can be near HANA speed. But as Sybase IQ is not yet optimized for things like restricted key figures, non-cumulative key figures or hierarchies, queries with such elements will be slowed down more. Sybase IQ is column based though, so in most scenarios you can expect a better performance compared with traditional row/disk based databases. My expectation is that Sybase IQ will soon be optimize to support more advanced OLAP operations better.
When you do consider NLS be sure to put cold data in NLS only: NLS is not designed for high throughput so frequent requests for larger datasets will slow down the system considerably.
This post has provided an overview of the options for reducing your SAP HANA in-memory footprint using multi-data management technologies available in SAP BWoH. There is a wealth of information on these subjects on sap.help.com.
For a direct link to the multi temperature management page click here. Michaela Pastor has done a great job by creating a central point on SCN with references to NLS documentation. Click here for the direct link.
The keys to success are to keep your database clean (make sure your housekeeping is up-to-date), knowing your data (volumes, future growth, hot/warm/cold classification) and selecting the technologies which are right for you. Hopefully this post helps you with this.