I have been recently involved in a project using the new Oracle 12c Database In-Memory product. It has been fascinating and I thought I'd share some learnings in a FAQ.
What is Oracle 12c Database In-Memory?
The Database In-Memory option is an additional option for the Oracle RDBMS as of version 184.108.40.206. It basically enables a secondary cache for existing database structures, which has a columnar storage format.
This cache is enabled on a tablespace, table, partition or column basis and when the table is accessed, the Oracle optimizer makes a decision of whether to use the row table or the columnar cache.
How much faster are tables that are cached?
This seems to vary depending on the type of query. Very simple queries like COUNT(COLUMN) are blazingly fast. Simple GROUP BY queries on a single table with simple filter predicates are also good.
However the Database In-Memory option is very much geared towards effective, efficient simple queries. This is demonstrated because when you increase query complexity, performance decreases so that by the time you get to a typical star schema OLAP query, the performance can be worse than before the in-Memory option was enabled.
The In-Memory cache is not sorted and therefore for aggregations that access a random or large volume of data, response time is good. For queries where data needs to be sorted prior to aggregation, the sort happens in the SGA and performance decreases. It is not possible to add additional indexes with specific sort orders.
How much additional memory is required?
This depends on the tables that you look to cache, but a rule of thumb is that around 50% of your database will be database tables (not indexes) and around 50% of that might be useful to put in a columnar cache.
The cache has some compression (Dictionary Encoding, Run Length Encoding, Bit Packing) and we found 1.5-10x compression. Large fact tables with lower entropy get lower compression rates, typically 1.5x. In any case, an in-memory cache of around 20% your database size is practical.
In addition, temporary RAM is required to persist interim results. For very simple range queries we found we needed a SGA of 5x the in-memory cache and for more complex queries we found we needed the SGA to be 10x the in-memory cache, for single-user operations.
There are several different types of compression that can be used, from query-optimized to storage-optimized. The storage-optimized compression types require decompression before evaluation of join predicates, which means most users will stick with query-optimized compression.
What is the cost of the In-Memory option?
This will cost you $25k per CPU at list price. In most cases, Oracle use a 2:1 core:CPU ratio so a typical 4 socket, 15 core system with 60 total cores would require 30 licenses, or $500k. That's in addition to the Oracle Enterprise, Partition, RAC and other licenses you will require like Dev, Test, HA, DR.
Note that in addition, it may be necessary to buy more hardware to support the In-Memory option, and this will incur additional Oracle Enterprise licenses and other options which are used.
Does the Database In-Memory support SAP applications?
SAP support for Oracle databases and features is determined by a joint Oracle/SAP team, and typically lags Oracle’s release schedule substantially. Oracle 11g was released in August 2007, and was supported by SAP in March 2010 (shortly after Oracle 11gR2). This is likely due to the complexity of SAP applications running on Oracle databases. As of December 2014, Oracle 12c still isn’t supported for SAP applications.
Since Oracle 12c isn’t supported, the Database In-Memory option isn’t supported. It seems likely based on past timelines that 12c support will come in mid-late 2015. This is tough for customers, because Oracle 11g is only supported until January 2016, so they will have to move quickly to upgrade.
It seems likely that support for the In-Memory option will follow in 2016.
How will the Database In-Memory option work as a sidecar for SAP?
The first scenarios for the Database In-Memory option will be as a sidecar to accelerate ERP reports. Long-running queries will benefit out of the box for certain scenarios.
SAP will presumably integrate the In-Memory option into the Data Dictionary layer so it is possible to enable a core set of tables by functional area (e.g. Finance/Controlling). The licensing implications of this are unclear, since there is no support for this yet, but it seems likely that there will be a charge for all customers, plus the cost of hardware, consistent with other database options as per SAP Note 740897.
Does the Database In-Memory work well for custom apps?
Yes, this is the primary use case for the Database In-Memory. Custom apps, especially custom data marts, should work quite well with the In-Memory option.
There is the question of cost of licenses and hardware and ROI, but from a technical perspective, it will be possible to cache tables on which it is necessary to run queries, and get good performance.
Does the In-Memory option run on existing hardware?
Yes, in theory the in-Memory option will run on your existing Oracle hardware. However, in practice you will need to augment your RAM by a significant amount to get the benefit.
In our experience, a 512GB database with an existing SGA of 128GB would require 512GB+ of RAM to take proper advantage of the in-memory option. And since you pay per CPU, it makes sense to put the maximum RAM possible into the system.
Does the Database In-Memory option run on Engineered Hardware
Yes, and some features of the In-Memory option, particularly the RAC data replication features, will only run on Engineered Hardware. Presumably this is because Engineered Hardware is guaranteed to have good network connectivity between nodes.
Does the Database In-Memory option scale?
Yes, it appears to scale in much the same way as Oracle RAC. Tables can be partitioned between multiple nodes, and caches are then built on those individual nodes. In theory, it is possible to scale the database to very large installations; I haven’t done this and at a practical level significant discounts would be required for it to be cost effective.
What happens if you run out of RAM?
Interim results for database queries like sorts and joins, appear to be stored in row format in the SGA, so even with a single user and a SGA 10x the size of the columnar store, it is easily possible to exhaust the SGA.
When the database has exhausted the SGA, it will start to use the TEMPDB, rapidly leading to I/O waits. Once TEMPDB is exhausted, the query will crash, so you need to take care in this situation as overall system performance will suffer in the meantime.
We suspect that the SGA will have to be refactored in later versions of the database, because analytic queries can cause very poor overall system performance.
What is required to use the In-Memory option?
Oracle described the usage of the In-Memory option as “flip a switch and go”. Here are the steps that we found were required to enable the option:
- Analyse the application to understand which tables will benefit for in-memory processing
- For each table:
- Identify which columns and partitions require acceleration
- Identify the priority of these
- Identify the best compression type to use
- Analyze how much memory will be required in total
- Perform sizing for this use case, and purchase hardware/licensing
- Enable the In-Memory option for each table, partition, column with the correct priority and compression
- Analyze the performance increase, and tune the application iteratively.
There are currently no automated tools available to help with this, which makes the installation and management of the In-Memory option quite a manual process.
What is the stability like?
In previous projects with IBM DB2 BLU, we found that their new column store caused severe stability problems. The column store would crash with certain queries, and this would take the whole database server process down including the row store.
This wasn't the case with Oracle - crashes weren't observed. However at times the in-memory option seemed to cease operations and we then reinstalled Oracle to get it to reactivate. Sometimes a restart of the server or the database would fix it. No doubt this will be resolved in a subsequent release.
For instance, we found that AVERAGE is usually one order of magnitude slower than SUM and COUNT combined. And in one case, COUNT was two orders of magnitude slower than AVERAGE. More investigation is required to understand why this should be.
What happens in the event of memory errors?
Intel has ChipKill technology built into their E7 processors, which allows the detection and marking of bad DIMMs so they can be disabled. We hope that Oracle will introduce ChipKill awareness into a subsequent release as it is extremely helpful to have a means of marking bad memory segments or DIMMs.
When we had a problem with a DIMM, we experienced severe performance problems. Changing the DIMM out solved the problem.
Does the In-Memory option support parallel processing?
By default, queries do not execute in parallel, as they do with the regular Oracle RDBMS. It is necessary either to manually enable parallel processing, or to use a database hint to run the query in parallel.
In this case, the database will spin up parallel worker processes to run pieces of the query, up to the number of parallel servers that are configured. If those are already in use, then subsequent queries will have to wait for a thread to become available and often run in a single thread.
As a result a busy database may have unpredictable response times and may not share the number of parallel threads with the amount of processes and queries running at that time.
If a thread isn't available, queries may hang for seconds or minutes before they can start processing. This requires very careful tuning to get the Degree of Parallelism (DOP) correct for the data available.
In addition, there is a significant cost to spinning out parallel processes, and the Degree of Parallelism is not linear in respect to the number of threads in use.
One thing we found very interesting was that a query running with 80 parallel threads on a 40-core system ran faster than a query with 40 parallel threads. This can only be because it was taking advantage of Hyperthreading, which is only of benefit when the database is CPU-bound. But since an optimal in-memory database design is IO-bound between the CPU and RAM with current Intel hardware designs, Oracle has some optimization to do.
How does the column cache deal with data changes?
An initial version of the column store is built when the data is first accessed. This is comprised of multiple In-Memory Compression Units or IMCUs, which contain unsorted data in the same order as the row store. The IMCUs are dictionary encoded and use various compression strategies.
Because of the nature of column stores, they are effectively read-only. An insert, update, or delete requires a reorder of the attribute vector and a resort of the dictionary.
The column store is only a cache and so any inserts, updates, or deletes are persisted to the row store first, like in the regular Oracle RDBMS. These changes are also then inserted into a Transaction Journal, which is a secondary store. If there is an update or delete, then the relevant column values in the main IMCUs are then invalidated.
This Transaction Journal store is much slower and larger than the main IMCUs because it is uncompressed, and query performance will degrade as it rises in size. So a periodic merge of the Transaction Journal and IMCUs occurs, to recompress the data. In addition, when there are idle database cycles, there is a trickle load process which reduces the size of the Transaction Journal and inserts them into the IMCUs.
In theory there is a delay between row and column store data, but in practice the delay seems minimal, even with concurrent inserts and queries.
Are there limitations of the column store format?
Yes: Index Organized Tables, Clustered Tables, LONGS and Out of Line LOBS are not supported. The most serious of these is clustered tables, because applications like SAP ERP use clustered tables in Oracle for tables like BSEG (Accounting Document Segment) and this severely limits the usability of the In-Memory option for SAP ERP.
In addition, use of Active Data Guard standby instances is not supported.