Data integration SAP HANA style: Smart Data Integration

14 July 2015

Jan van Ansem

Jan van Ansem

Principal consultant SAP Data Warehousing

I was in a SAP HANA training course a few weeks ago where, almost as an aside, the trainer mentioned the following: “With SP09 the HANA platform comes with a comprehensive set of data integration, data quality and data streaming functions”. Well…. I nearly fell off my chair!

I hadn’t seen this coming.  Now, a few weeks later, I realize that I am not the only one who completely missed this latest innovation. My first reaction to the news was, rather selfishly, “Oh no, I will have to get yet another certification”. Soon after, I realised that this was pretty small inconvenience to me compared to organisations running SAP Data Services. Well, to put some minds at ease, there is no need to move away from SAP Data Services, as I will explain in this blog post.  

What is SAP HANA Smart Data Integration, Quality and Streaming?

On the SAP HANA platform, you now have access to three new services:

  • Smart Data Integration (SDI)
  • Smart Data Quality (SDQ)
  • Smart Data Streaming (SDS)

For those of you who know about SAP Data Services, the first two should sound familiar. They are the twin brothers of Data Integration and Data Quality in SAP Data Services. Almost all data integration functions and data quality functions you find in Data Services can now also be used from HANA SDI/SDQ. The few functions which are not available yet will no doubt follow soon. SDS is a new kid on the block: it provides the ability to working with streaming events. Not quite my area, ‘tickertape’ and ‘snapshots’ come to my mind. If you want to know more about this than Jeff Wootton’s blog and demo on SCN is a good starting point.

What does the future of SAP Data Services look like?

I am still a fan of SAP Data Services. The way you can connect to practically anything, the easy way of defining complex transformations, the great performance and the limitless scalability.  At first sight, based on demos only and without having hands-on experience, it looks like HANA SDI/SDQ have retained much of the good stuff that characterised the SAP Data Services application. And to be honest, not everything was great about Data Services, so there are some improvements as well: You no longer have to run a heavy client application; the system architecture is hugely simplified and it will hopefully be an end to the latency problem which SAP Data Services was suffering from, in case the development application was physically far away from the repositories. It is also optimized for real time replication, which never was Data Services strongest point anyway.

But most of these problems could have been resolved by redesigning the developer application and some architectural changes. Surely after seventeen(!) years of running pretty much the same platform (under a wide variety of names) that is not too much to ask for?

A move to the SAP HANA platform could be driven by the need for better performance but in my experience the SAP Data Services engine was rarely the cause of poor performance. End-to-end performance is mainly a matter of how fast you can suck the data out of a source system and how quickly you can pump it back into another system . The only exception is perhaps when you have complex address de-duplication and cleansing processes, which rely a bit more on pure processing power.

What I had expected was that SAP Data Services would eventually run on SAP HANA, so the expensive transformations would run faster. But then, when looking at this development from the perspective of a new customer, the integration of ETL services (Extract, Transform Load) into the HANA platform makes complete sense. You can use the same interface for ETL as you use,  for example, database modelling and analytical modelling. There is no further hardware to add to the landscape so implementation and maintenance is hugely simplified. It fits in with SAP’s ambition to make things simpler.

Coming back to the question “What does the future of SAP Data Services look like?” SAP is very keen to point out that SAP Data Services is its flagship Enterprise Information Management (EIM) product. SAPs EIM customer bases is estimated at around 12,000 organisations. SAP Data Services is firmly in the desirable part of Gartner’s Magic Quadrant for Data Integration tools, with customers describing the EIM technologies are increasingly relevant for supporting their EIM goals.  With this in mind it is clear to see that will continue to support and invest in SAP Data Services. My personal view though is that very soon all functionality which is delivered by SAP Data Services will also be provided by SAP HANA SDI/SDQ and from then on all innovation will take place in SAP HANA SDI/SDQ.

If you are new to ‘ETL’ and you are running some business applications on the SAP HANA platform then there is no doubt in my mind that you will be better off investing in HANA SDI/SDQ instead of SAP Data Services.

We already use SAP Data Services. Now what?

Nothing yet. As mentioned above, SAP Data Services is still a best of breed ETL tool and the roadmap shows mainstream support until the end of 2018 and priority one support until the end of 2020. I am slightly concerned that we will not find any major updates soon as all key product developers seem to have moved from SAP Data Services to HANA SDI/SDQ. We will probably be on 4.2 a bit longer, with regular service packs for bug fixes and minor changes.

SAP has not delivered a tool yet to migrate Data Services jobs to HANA SDI/SDQ. I don’t know if such tool is on the roadmap – there is no public available SDI/SDQ roadmap and the HANA roadmap, Jan ’15 only mentions ‘Enhanced Smart Data Integration/Quality/Streaming capabilities’ for future developments.

My advice for now is to just keep a close eye on new developments in HANA SDI/SDQ. When it the product has matured some more you can start thinking about when and how you are going to migrate from SAP Data Services processes to ‘Smart’. Hopefully SAP will soon come up with a dedicated training course and certification programme for this. Currently it is only covered as part of the 2-day training course SAP HANA data provisioning and given the further content of this course, you will only get a glance of SDI/SDQ.

Where to look for more information

HANA SDI/SDQ has managed to stay under the radar, but there is some great content out there. It is a bit of a struggle to find though, so hopefully SAP and SCN will soon find a better place to put SDI/SDQ related content.

The ‘home’ for SAP Data Services on SCN is under the banner of ‘Enterprise Information Management’. (Products > Analytics > Enterprise Information Management > Data Integration and Data Quality > Data Services & Data Quality). In my opinion, this should be the natural home for SDI/SDQ. Unfortunately, you will not find anything there.

Instead, you will have to go to the slightly obscure ‘Developer Centre’ where there is currently a less than welcoming message saying ‘we are moved’.

Save yourself some trouble and use this link to Werner Daehn’s post (who else?) for all the links to anything to do with SDI/SDQ.  

Another excellent source for information is the SAP HANA Academy on YouTube. The playlist for ‘Enterprise Information Management SPS09’ contains no less than 31 videos varying from simple transformations and datasource administration to Twitter replication (how cool is that!?).

I am going to watch some more of those videos now and hopefully very soon I will be able to get my hands on the kit. If I find anything interesting to share, I will let you know.

View comments

Comments

Blog post currently doesn't have any comments.

Bluefin and SAP S/4HANA - welcome to the one horse race