It was some time before SAP TechEd in Bangalore that I learnt about the SAP HANA Developer Program. A cloud-based service which allows developers to get hands-on with a SAP HANA appliance. I was sceptical at first: how could SAP roll out a large-scale service based on $200k appliances, allowing multiple people to develop on it and for it to work?
So SAP opened the Developer Program up to a small number of - around 50 - individuals, with the purpose of seeing how the scheme worked, providing feedback and helping to get it to a state where it could be rolled out to a mass-audience. It is the brainchild of Vishal Sikka and my first reaction: totally awesome.
How will the program work?
I think the details are yet to be ironed out a little but basically: you sign up as a developer and get some level of free access (90 days is the number I heard), followed by the possibility of a subscription-model for those who find it useful for a longer period of time. I haven't heard anything about the costs but my advice was to make the barrier-to-entry low: the value of good developers in the ecosystem is very high.
How does the service work, at a high level?
Well for now it is a closed beta but it will be opened up soon, and it works as follows.
- You subscribe to the service on a web form and get an activation code.
- You get a confirmation email that contains a link, HANA hostname, user and password. You have 5 days of usage (120 hours) over 90 days in the trial I have. I suppose this may vary in future.
- You login to a cloud server via a Java applet that acts like a Microsoft Remote Desktop and it gives you access to a server which has HANA Studio, Analysis for Excel and Internet Explorer to access the SAP BusinessObjects BI4 suite for data.
- You upload your data to the cloud, build tables in HANA, load your data into the tables, build views and report on top of them.
OK so I subscribed, how do I get it to work?
So assuming you got yourself onto the Remote Desktop, I will walk you through how to get data onto the system.
Step 1: get your data onto HANA
In this step we learn how to get data into the cloud and onto the HANA disk store. It won't be ready for access yet!
- You need to put your data into CSV files and I'll assume you know how to do that already. If not, you may find the rest of this too technical!
- ZIP your CSV files up to save space and upload them into a dropbox site. If you work for SAP you may have access to SAPMATS, otherwise you can use dropbox.com or some similar service.
- Login to your Cloudshare appliance and download the files. You will only have 3-4GB available so may have to do this a bunch of times if you have a lot of data. I got about 10MBit/sec downloads.
- Unzip the data directly onto the HANA file share: \\hanasrv01\filer or similar. I found I only got 2-3MBit/sec uploads to HANA but I could run multiple extracts and get good aggregate performance.
- N.B. Cloudshare times out after some period of inactivity so you have to keep using the system. This is a bit of a pain and I hear SAP are working on a simplified mechanism.
Step 2: get your data into HANA
So now you have your data on the HANA appliance filesystem but it isn't ready for reporting yet. We have some work to do.
- Watch Juergen Schmerder's YouTube video here. He will get you working with SAP HANA Studio and get you logged into your appliance. This is an important first step as your Cloudshare environment will not yet be configured for you. You can access links to tutorials and further information here.
- You now need to create some tables in your area. You can either open a SQL window and type your own SQL - something like this:
create column table "S000XXXXXXX"."CUSTOMER"(
"CUSTOMER_NAME" VARCHAR (32) DEFAULT '',
"ADDRESS1" VARCHAR (32) DEFAULT '',
"ADDRESS2" VARCHAR (32) DEFAULT '',
"ADDRESS2" VARCHAR (32) DEFAULT '',
- Or alternatively you can use the graphical tool. This obviously needs to match the format of the CSV files that you plan on importing.
- Now you need to create a file called customer.ctl and put it in \\hanasvr01\filer (or similar, watch for your HANA server name). It contains the following lines:
- into table CUSTOMER
- infile /filer/customer.csv
- badfile /filer/customer.bad
- Remember that HANA runs on Linux and these names are case-sensitive. Watch out because the Cloudshare appliance will hide the CSV ending and it may be (for example) customer.CSV! The .bad file is for when you made mistakes and your file format doesn't match your table definition. HANA will give you some clues on how to fix it.
- Now you need to run a LOAD statement into a SQL window. The following statement will work:
- LOAD FROM '/filer/customer.ctl' THREADS 10 BATCH 50000
- This is set to run with 10 threads and committing every 50,000 rows. You can play with these but this should be roughly optimal for a large file. In my testing, a 16GB CSV with 172m rows took 6m30 to load and used 6GB of HANA memory.
- Note that once the table has loaded, it will then need to push the delta store into the main memory store. This is a feature of HANA that optimises both writes and reads, but the delta store is much slower for read performance. You can force it using the following command:
- o update CUSTOMER merge delta index;
Step 3: Test your new table!
I'm going to stop after this and write the rest in another article: creating views, and reporting on them. But you can already test HANA performance. Suppose you have this data in an existing data warehouse (I did), you can execute the same SQL. So you could do:
SELECT COUNT(*) FROM CUSTOMER WHERE SECTOR_ID=5;
In my example, it improved query reporting speed from 9m30 to 300ms - more than 1500x faster.