I’ve been meaning to pen an update to this FAQ for nearly 2 years, with this being the primary listed reference on Wikipedia, but somehow never found the time. When I heard Steve Lucas wanted to collaborate, I thought it was time for a rewrite and update!
Part 1 – HANA Overview
Part 2 – HANA Technology
Part 1 – HANA Overview
SAP HANA is an in-memory database and application platform, which is for many operations 10-1000x faster than a regular database like Oracle on the same hardware. This allows simplification of design and operations, as well as real-time business applications. Customers can finally begin to reduce IT complexity by removing the need for separate and multiple Application Servers, Operational Data Stores, Datamarts and complex BI Tool implementations.
SAP HANA is a “reinvention” of the database, based on 30 years of technology improvements, research and development. It allows the build of applications that are not possible on traditional RDBMS, and the renewal of existing applications like the SAP Business Suite.
SAP co-founder and Chairman Hasso Plattner believed that if a database could be built with a zero response time, that business applications would be written fundamentally differently – and IT landscapes could be simplified. The research institution at the Hasso Plattner Institution in Potsdam theorized that with modern computers and software design, this would be very nearly possible.
SAP makes business applications and since it was clear that none of the incumbent software vendors like Oracle would write such a database and application platform, they needed to build their own. In addition, this would be the springboard for a complete renewal and simplifying of SAP’s applications to take them through the next 20 years.
No. When SAP went to build HANA, they realized that the next generation of business applications would require a much more integrated approach than in the past.
SAP HANA contains – out of the box – the building blocks for entire enterprise applications. HANA can take care of the requirements that would be served by many layers in other application platforms, including transactional databases, reporting databases, integration layers, search, predictive and web. All of this is served up working out the box, with a single installation.
SAP built SAP HANA from the ground up, including research from the Hasso Plattner Institute in Potsdam, the acquisition of the IP from the p*Time database, the TREX search engine, BWA in-memory appliance and MaxDB relational database. It has been extended with intellectual property from the Business Objects and Sybase acquisitions with products like Sybase IQ and Business Objects Data Federator.
Whilst HANA has a legacy and some code from other products, the bulk of the database and platform has been written from the ground up.
SAP HANA is different by design. It stores all data in-memory, in columnar format and compressed. Because HANA is so fast, sums, indexes, materialized views and aggregates are not required, and this can reduce the database footprint by 95%. Everything is calculated on-demand, on the fly, in main memory. This makes it possible for companies to run OLTP and analytics applications on the same instance at the same time, and to allow for any type of real-time, ad hoc queries and analyses.
On top of this SAP built solutions to all the problems of columnar databases, like concurrency (HANA uses MVCC) and row-level insert and update performance (HANA uses various mechanisms like a delta store).
If this wasn’t enough SAP added a bunch of engines inside HANA to provide virtual OLAP functionality, data virtualization, text analysis, search, geospatial, graph (will be available soon) and web. It supports open standards like REST, JSON, ODBO, MDX, ODBC and JDBC. There is as much functionality in there as a whole Oracle or IBM software stack, in one database.
The first HANA deployments were all analytical use cases like Datamarts and Data Warehouses because the benefits are there right out the box. EDWs like SAP BW run like lightening with a simple database swap.
With a transactional application like Finance or Supply Chain, most things run a little better from a simple database swap (SAP claim 50% faster for their own core finance). The real benefits come when logic from the applications are optimized and pushed down to the database level, from simplification of the apps (SAP is building a simplified version of their Business Suite), or from ancillary benefits like real-time operational reporting, real-time supply chain management or real-time offer management.
Best of all, unlike the other database systems in the market, HANA supports all applications on the same instance of data at the same time. No more copying, transforming and re-organizing data all over the enterprise to meet the needs of different applications. HANA perfectly serves the needs of all applications with one “system of record” instance.
SAP has provided a Use Case Repository that catalogues the various use cases for HANA.
SAP CEO Bill McDermott said “HANA is attached to everything we have”.
Almost all the major SAP Applications now run on the SAP HANA platform. This includes the SAP Business Suite (ERP, CRM, PLM, SCM) and the SAP BW Data Warehouse.
The BI Suite including BusinessObjects Enterprise, Data Services and SAP Lumira are all designed to run on the HANA platform.
There are a set of Applications Powered by SAP HANA including SAP Accelerated Trade Promotion Planning, SAP Collection Insight, SAP Convergent Pricing Simulation, SAP Customer Engagement Intelligence, SAP Demand Signal Management, SAP Assurance and Compliance Software, SAP Liquidity Risk Management, SAP Operational Process Intelligence, and SAP Tax Declaration Framework for Brazil.
In addition, SAP runs much of its cloud portfolio on HANA, including the HANA Cloud Platform and SAP Business ByDesign. The Ariba and SuccessFactors apps are in the process of migration.
We’ve built business cases for HANA deployments of all sizes and whilst they vary, there at a few common themes:
- TCO Reduction. In many cases HANA has a lower TCO. It reduces hardware renewal costs, frees up valuable enterprise storage and mainframes and requires much less maintenance
- Complexity to simplicity. HANA simplifies landscapes by using the same copy of data for multiple applications. Our implementations have shown that adding additional applications to a HANA dataset are very fast and easy, delivering business benefits quickly
- Differentiation. HANA’s performance, advanced analytics (Predictive, Geospatial, Text analytics) and simplicity often mean a business process can be changed to be differentiating compared to competitors. Customer scenarios like loyalty management, personalized recommendations and anything where speed or advanced analytics capabilities are differentiating are all candidates
- Risk Mitigation. Many customers know that in-memory technologies are changing the world and so want to put an application like SAP BW on HANA or LOB Datamarts as a first step, so they can react quickly for future business demands.
SAP HANA was designed to be a truly modern database platform, and as a result the answer is: all of the above. A modern database should be a database, platform and be available on-premise or in the cloud.
SAP has a large installed-base of on premise ERP customers, and the HANA platform supports their needs, especially the need for an enterprise-class database. Many of those customers are looking for an on-premise database to replace the traditional RDBMS.
The demanding needs of an in-memory database mean that SAP elected to sell SAP HANA as an appliance, and it comes pre-packaged by the major hardware vendors as a result.
However the future of business is moving into the cloud, and SAP HANA is available as Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) with HANA Cloud Platform and Managed Cloud as a Service (McaaS) with secured HANA Enterprise Cloud and via 3rd party cloud vendors. Customers can also choose Hybrid deployment model that combines on premise and cloud. More details on this are available here.
SAP HANA was designed to be a replacement to Oracle or IBM databases, either for net new installations or for existing customers. In most cases it is possible to move off those databases easily, and gain reporting performance benefits out of the box. Then it is possible to adapt the software to contain functions that were not possible in the past.
All three of the major RDBMS vendors have released in-memory add-ins to their databases in the last year. All of them support taking an additional copy of data in an in-memory cache, or in IBM’s case columnar tables. All of them provide improved performance for custom data-marts. But make no mistake; caching data has been around for a long time, while an in-memory database platform to run transactions and analytics together in the same instance is a new innovation.
Traditional database caching solutions are similar to the GM and Ford response to hybrid cars – take their existing technology and bolt new technology to it. SAP HANA is more akin to Tesla, who rebuilt the car from the ground up based on a new paradigm.
And so HANA’s capabilities from a business application perspective are 3 years ahead in technology from what others have.
SAP tried to keep licensing simple with HANA.
HANA is available in the Cloud as Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and as an application platform (AaaS), and it is possible to buy all those options now, on a monthly basis, from the SAP Website.
For on-premise customers, HANA is licensed in one of two major ways:
First, is as a proportion of your Software Application value (SAV), just like you can license other databases from SAP. This could be for your whole estate, or for a specific product like BPC.
Second, is by the unit, which is 64GB of RAM. There are a few editions of HANA, depending on your need, that bundle other software and allow more, or less, restrictive usage. The pricing is tiered, depending on the number of units you buy, and accretive.
In all cases, HANA licensing includes a lot of functionality that you would pay extra for in other databases. For example, Dev, Test, HA, DR licensing are always included. And if you buy HANA Enterprise, you have access to all functionality at no additional cost – including Predictive Libraries, Spatial, Graph, OLAP, Integration and Web. HANA contains a huge amount of functionality that would require 20-30 different SKUs from Oracle.
For those customers who need the base functionality of HANA but not the bells and whistles, there is now a HANA Base Edition, on which you can add other functionality as required, at a lower cost point.
Part 2 – HANA Technology
With current hardware, SAP HANA can scale up to 6TB for a single system, and can scale out to 112TB in a cluster, or more. There is no hard technical limit to the size of a HANA cluster. Higher configurations are tested and certified at customer sites.
We are currently working with 24TB single systems with SAP that we expect to see this year.
At Bluefin Solutions, we regularly work with 2-10TB of memory in a single HANA DB, and this is where we find most business cases make sense. Remember that a 10TB HANA appliance can store a vast amount of data (as much as 50-100TB from a traditional RDBMS due to HANA’s data compression capabilities); this could represent all the credit card transactions for a top 10 bank for 10 years or more.
In addition, we find that customers look to be more intelligent about how they tier data with an in-memory appliance. Once the HANA database grows past 2TB, it makes a lot of sense to use a cold store like Sybase IQ for slow-changing data.
SAP HANA stores data for processing primarily in columnar format. But unlike other columnar databases, HANA’s columnar store was designed from the beginning to be efficient for all databases operations (reads, writes, updates). In practice, 99% of the database tables in SAP ERP are columnar tables, including transactional and master data tables.
HANA can also store data in row format, but this is primarily used to store configuration information and queues – only scenarios for which the column store is specifically not suited. With HANA, data is stored once, in its most granular form, and aggregated on request. There is no hybrid row/column store, no duplication or replication of data between row and column stores – HANA stores the data in the column store only.
Every column in SAP HANA is stored as an index, and therefore HANA has no need for separate primary indexes. Secondary indexes with multiple columns are possible and used for OLTP scenarios like the Business Suite. HANA will also self-generate helper indexes to ensure that multi-column joins are efficient.
It is almost never necessary to aggregate data in HANA in advance because HANA calculates so quickly. HANA processes at 3bn scans/sec/core and 20m aggregations/sec/core which means 360bn scans/sec and 2.5bn aggregations/sec on a typical 120-core appliance. As a result it is much more efficient to calculate the information you require on demand.
Yes, although HANA is best suited to high-value data, because it keeps data mostly in-memory. When Big Data is low value (e.g. web logs), HANA is very well suited as the store for high-value aggregated information and applications. This could be an organization’s hot data, e.g., 4 months of financial information for quarterly reporting. Other sources could be used to store additional data; for example SAP IQ could store 13 months of financial data for annual reporting (warm data) and Hadoop could store >10 years of financial data for seasonal and long term trend analysis (cool data). Large volumes of data in both IQ and Hadoop can be analyzed in combination with data in HANA, so it is possible to process the data in HANA into full-text Google-style indexes without storing all the detail in HANA.
Yes. From its inception, HANA was intended to be a mission-critical database.
SAP HANA always stores a copy of data on disk for persistence, so if the power goes out, it will load data back into memory when power is restored (generally on-demand, but this is configurable). It stores logs so a very low Recovery Point Objective is possible.
HANA also has inbuilt capabilities to replicate the data to standby systems, so in a cluster, you can have High Availability and in any configuration you can have a cluster for Disaster Recovery and Fault Tolerance for business continuity. Disaster Recovery can be configured at the storage-level (depending on vendor) and also at the database level, which is called system replication.
It’s worth noting that most customers implement either HA or DR for HANA. It is exceptionally easy to setup (DR takes just a few clicks) and most customers that invest in HANA find business continuity is important to them.
SAP HANA also has interfaces for 3rd party backup and monitoring, like TSM or NetBackup. Solution Manager and SAP Landscape Virtualization Management are supported if you’re an SAP shop.
SAP HANA was designed to be “timeless software”, meaning that any revision can be updated to any other revision with no disruption. It is possible to update from any revision of HANA to any other, with very few restrictions.
Every 6 months there is a major release of HANA, called a Service Pack. Service Pack 8, or SPS08, was released in June 2014. SPS09 is expected in November 2014. These contain new features and major updates, and SAP HANA continues to be developed.
Each SPS gets a number of updates, or revisions, and these contain fixes and performance improvements only, as you would expect in enterprise software. Typically these are released every 2-6 weeks, based on demand. As HANA matures, we have seen fewer revisions per SPS.
In addition, there are maintenance releases of SAP HANA for the previous SPS for an additional 6 months, to allow customers to apply critical fixes whilst planning an update to the latest SPS. The maintenance releases contain only critical bug fixes.
SAP HANA is a completely ACID-compliant database which is designed to have a low Recovery Point Objective (RPO). HANA writes savepoints to disk at frequent intervals, which contain a snapshot of what is in memory. In-between savepoints, HANA saves a log of each database change to a fast flash disk.
If the power goes out, HANA loads the last savepoint and then plays the logs back, to ensure consistency.
HANA appliances must be certified and come either as pre-build appliances from your vendor of choice or as a custom build using your storage and networks “Tailored Datacenter Integration” or TDI.
SAP maintains a list of certified hardware platforms which currently includes Cisco, Dell, Fujitsu, Hitachi, HP, Huawei, IBM (Lenovo), NEC and SGI, and is being extended all the time. Note that this list only contains the new “Ivy Bridge” appliances and not the older “Westmere” appliances.
The exact hardware and storage configuration varies depending on a vendor. Some use servers and other use blades, some used a SAN storage network whilst IBM uses local storage with the GPFS distributed file system. In our experience, all these variants work very well.
In addition you can buy HANA in the cloud from Amazon, SAP and various other outsource partners like T-Systems or EMC. In this case, you can either pay a monthly subscription fee including license, or use an existing Enterprise license “Bring Your Own License”.
For Intel x86, both SUSE Linux and RedHat Linux are now supported options. Both have a SAP-specific installer that configures Linux correctly for SAP HANA out the box.
For IBM POWER, the SUSE Linux operating system will be supported. At this time it does not look like SAP will support AIX.
What development software does SAP HANA use?
SAP HANA has two primary development environments. The main desktop software is called HANA Studio, which is based on Apache Eclipse. HANA Studio allows for administration and development in a single interface, which is extremely effective. It is possible to create entire developments in HANA Studio, which provides application lifecycle management and development capabilities for all HANA artifacts – from data model through to stored procedures through to web application code.
There is also a web editor and administration panel based on Apache Orion, which continues to be developed and is a useful addition. We expect to see convergence of these two tools in the future, to allow choice for cloud developers in particular.
Lifecycle management is entirely managed within a Web application within the XS Application Server.
SAP HANA has a wide range of interfaces. SAP’s own BI Suite, Lumira, Design Studio and Analysis for Office software all have native HANA connectors. Likewise, many third party applications like Tableau, Qlik and MicroStrategy all have HANA connectors.
SAP HANA has open standards support for ODBC, JDBC, ODBO and MDX as well as a raw SQL client, hdbsql. In addition, there are Python libraries for HANA.
In addition, ETL software like Data Services and Informatica is supported, as well as System Landscape Transformation (SLT) and Sybase Replication Server (SRS) for real-time replication.
The majority of the SAP HANA software stack was written in C++. In fact, when you compile SAP HANA objects, they do in turn become C++ code, which is one of the reasons why HANA is so fast. The Predictive Analysis Library and Business Function Libraries are also written in a HANA-specific variant of C++ called L-Language, which provides memory protection.
Certain optimizations have been made using C and machine code, which is common for many databases. In addition, a lot of the tooling for HANA was written in Python, for ease of writing and adaption.
HANA always stores data on disk and loads parts of database tables on demand into RAM. When RAM is exhausted, HANA will drop out parts of database tables that were least recently used.
In addition, the Smart Data Access data virtualization layer allows you to access data in any other database, like Sybase IQ or even Oracle and transparently access it like any other data in HANA. This helps improve the TCO of HANA, and simplifies your IT landscape by reducing the amount of data copied, transformed and moved around the enterprise.
In a future release of SAP HANA, we expect to see a transparent disk store, where warm, lower value data can be stored at a lower TCO. This is called dynamic tiering.
It’s worth noting that HANA and Hadoop are great friends – you can store documents and web logs in Hadoop and then store aggregated information in HANA for super-fast analysis. Need to add a new measure? Run a batch job in Hadoop from HANA to populate it.
The HANA SPS08 release was all about enterprise readiness and stability and there were relatively few new features. In SPS09 we see this changing once more and it looks like there will be lots of new functionality that customers will find useful. These are the themes we expect:
Support for more hardware platforms (IBM Power, maybe Intel E5) with fewer restrictions on components, plus larger hardware platforms (16- and 32-socket), and multi-tenancy.
A built-in disk-based store that supports dynamic data tiering for warm data, to reduce TCO.
The start of integration for event processing and ETL, and code push-down, and HANA Studio Integration.
Increased support for Hadoop and HDFS access.
Improvements to system replication, backup and system copies for Enterprise scenarios.
I have taken the time to curate a page on SAP’s Community Network SCN “SAP HANA – a guide to Documentation and Education”, which contains numerous links to other resources. If you’d like to know more about SAP HANA, then this is a great place to start. SAPHANA.com is also a wonderful resource to find out more about HANA. You can start using HANA today with free SAP HANA developer edition.
Some of this information came from meetings and interviews with the key HANA friends at SAP – Hasso Plattner, Vishal Sikka, Franz Färber, Mike Eacrett, Steffen Sigg and many others.
Special thanks to Steve Lucas for his efforts collaborating on this FAQ with me. Steve lives and breathes HANA and having his input into this FAQ is awesome! Also, thank you to Mike Prosceno and Amit Sinha for their editorial assistance. All the good stuff in this piece is theirs, and the mistakes are all mine.
As an end-note, the questions in this FAQ were compiled from two primary locations - articles and comments on existing HANA sources, and conversations with customers. If you think there are questions missing - please go ahead and ask them in the comments!