In this post I’ll be answering questions about SAP HANA Vora. As is customary, I’ve consumed the publicly available information about Vora, plus insights from my Vora team members who have been working on the product since July, and condensed them into this FAQ.
What is SAP HANA Vora?
To quote from the official SAP Press Release:
“SAP HANA Vora is a new in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop. As companies take part in their digital transformation journey, they face complex hurdles in dealing with distributed Big Data everywhere, compounded by the lack of business process awareness across enterprise apps, analytics, Big Data and IoT sources.”
In short, SAP HANA Vora extends Hadoop with in-memory computing, enterprise functionality and data science, helping customers simplify their IT landscape and bring context to data lakes. It also extends the HANA platform with the ability to store Big Data and has data temperature management to move data from HANA to Vora.
What does Vora stand for?
Aiaz Kazi tells the story of how HANA got its name in his latest blog post, which might be of interest to many of you! Vora is a backronym, and a friendly, snappy name. The internal project name was “Velocity”, and the name Velociraptor was heard a few times. But Vora is just short for Vora.
Anecdotally, SAP Transaction Code VORA is “Sales Document Archiving Control”, which is a possible use case for the data temperature capabilities of SAP HANA Vora.
Thanks to Mike Eacrett for pointing out that Vora is a combining form for “ones that eat”, for example Carnivora. Vora eats all your big data, perhaps?
What are the key use cases for Vora?
Vora is a bridge between SAP HANA and Hadoop, and as such there are many interesting use cases.
One straightforward use case is around data tiering for existing SAP customers. It will be possible to use commodity Hadoop clusters to store colder data for SAP ERP, like sales documents, pricing and billing conditions. These require occasional analysis and are read-only. We expect SAP to adjust its data tiering strategy for SAP S/4HANA to include Vora.
But as has been the case for SAP HANA, whilst those straightforward use cases are good, SAP HANA Vora is capable of much more exciting and differentiating things.
We are working with customers on such use cases as analyzing huge amounts of manufacturing test data in order to make better real-time packaging decisions, and analyzing fitness data from connected devices in real-time.
Where Vora is particularly interesting is when you need to “bring compute to the data” – running complex algorithms against in-memory data.
In addition, a lot of customers will do a straight rip-and-replace of Teradata with HANA and Vora, because it is substantially more cost-effective and an order of magnitude faster.
Do you need SAP HANA to run Vora?
Absolutely not. Vora is a standalone piece of software. There are use cases where SAP HANA won’t be installed alongside Vora, and SAP HANA and Vora run most efficiently on different types of hardware.
That said, it’s certainly true that there are a number of integration scenarios for HANA and Vora – tiered data for SAP S/4HANA, and very large in-memory databases, and SAP will build deep
What is the availability of Vora?
My team has been working on Vora since the beta launch in July, and Vora 1.0 is planned to be released to Customers on September 18th, with general availability coming later in the year. Various other pieces of functionality will be available later in the year, like the automatic data tiering components.
What are the requirements to run Vora?
That’s all quite technical but in short, Vora will run on just about anything. My team has it running in various customers, on Cloudera, Hortonworks and Amazon EMR Hadoop distributions, and on SUSE, RedHat and Ubuntu Linux. It will also run on Windows and Mac.One of the things that I like about Vora is that it goes back to SAP’s roots of being platform-independent. It has some core dependencies, for Vora 1.0 they are – HDFS 2.6, ZooKeeper 3.4.6, Spark 1.4, ProtoBuf 2.6.0, gcc 4.7, Apache Ambari and YARN or Spark Standalone cluster management.
This flexibility is important, because many customers have already made a choice of Linux and Hadoop distributions. Vora runs on all of them.
Does Vora require Hadoop?
Vora was designed to run on any distributed file system, and doesn’t necessarily require Hadoop. We will see in the future what that means, but the important point is that if another distributed file system becomes popular, Vora can adapt. If you look at what happened in the RDBMS over the last 30 years, SAP’s ability to adapt R/3 to different databases was key to its longevity in the market.
What platform does Vora run on?
Vora runs both in the Cloud and On-premise. SAP’s platform strategy is “Cloud-first”, so it will be available in the SAP HANA Cloud Platform (HCP) and Amazon Clouds. Both will provide single-click provisioning of Vora systems with no fuss, a little later on this year.
You can of course run Vora On-premise, on any hardware platform, and as noted above, almost any Linux and Hadoop distribution. The only thing to note is that some Hadoop platforms are designed as cold data lakes, and Vora is performance-centric, so it needs more spindles-per-core and more memory to run at its best.
What size does Vora come in?
Vora will run on anything that meets the requirements. For our simple development systems we have used Amazon 8GB or 16GB systems, and in one customer we are currently deploying 2000 cores, 20TB of DRAM and 1PB of disk for a HANA/Vora cluster, thanks to Lenovo System X and EMC Isilon.
In short, Vora comes in all sizes from XS to XXL!
What are the specific features of Vora vs Apache Spark?
Vora is an extension to the Hadoop platform and includes the following features in its first version:
- Accelerated In-Memory processing
- Compiled Queries
- Support for Scala, Python and Java
- HANA and Hadoop mash-ups
- Support for HDFS, Parquet and ORC
- NUMA awareness
The following features are planned in the near future:
- Support for C, C++ and R
- Currency conversion
- Dynamic data tiering
Is Vora based on SAP HANA?
No, Vora is a completely new code base, but the engineering team is the same group as the HANA engineering team, so many concepts and ideas have been borrowed from SAP HANA, as you can see by the feature list.
I’m pretty excited about SAP HANA Vora and we already have several customer projects undergoing beta evaluations. The initial reaction is very positive, because it aligns with their existing data platform strategies of HANA and Hadoop, and having a SAP-endorsed Hadoop platform is a big thing in itself.
Add to this a dramatically lowered average cost per TB of data, the data science capabilities and improved packaging and usability using SAP’s tooling for provisioning and user experience, and we think Vora is going to be a hit.
I have no doubts that I’ve missed important questions here, so please tell me what I’ve forgotten to include in the comments below and I’ll update this FAQ in the coming day and weeks!
Huge thanks to all the Bluefin Solutions team that worked with us on the Vora launch. Chris Kernaghan, Brenton O’Callaghan, Cal Loudon, Simon Ferres, John Bartley, Lloyd Palfrey, Mark Chapman, Oli Rogers, Nathan Oyler, Rob Walmsley, Srinivas Totapally and Stuart Bell. You will see our demos in the coming months!
Thanks as well to the SAP folks who worked with us. Special thanks to Austin Swope, who has been a life saver in making things happen. Also to Quentin Clark, Steve Lucas, Ken Tsai, Pam Barrowcliffe, Balaji Krishna, Linda O’Connor, Mike Eacrett, Daniel Culp, Stacey Fish, Andrea Kaufmann, Mike Prosceno and no doubt many others I forgot!