Storage is cheap, right? Browsing through my local hardware reseller's web pages, I can see that I can buy disks at some $100 per Tb. The problem is that enterprise storage isn't cheap, especially when you outgrow what you have. Also remember that a 1Tb production system is also a 1Tb dev, qa, training etc. system (after a few system copies) and so 5Tb of production, 5Tb of backups, etc. Data propogation causes huge increases in total storage requirements.
On our demo landscape, we have some 5Tb of enterprise class storage for our SAP systems, totalling about 29 virtual servers. We've run out of space and are considering a substantial capital investment to go up to the next tier of enterprise storage - 3 or 4 times what we paid for the existing storage.
What's more we live in a world where a global recession has taken a big hit, and where issues around sustainability means that buying ever bigger and better hardware isn't palatable. Virtualizing our SAP landscape saved us huge amounts of money, but the consolidation effect means that when you run out of space, the effect is much more costly.
All this means, that we had to think in terms of saving space first, and capital procurement second. Also, most SAP systems are I/O bound these days anyhow.
I remember a story from the early days of the search engine Altavista, which was of course funded by DEC and ran on DEC AlphaServers. The story goes that they were running out of CPU and had ordered some more AlphaServers. Mike Burrows (who wrote the indexer) rewrote his code over the weekend and traded 30% CPU for 30% storage. When the AlphaServers arrived on Monday morning, they weren't required.
I've been pondering what the benefits of database compress on MSSQL 2008 are, and when I met with some platform experts from Microsoft last week and heard how they were getting on, I was really intriqued.
If you are interested in this as a whole, then Juergen Thomas' blog is really worth reading. While you're there, read his white paper too.
DB2 has had row compression for some time and part of their value proposition over MSSQL and Oracle is 30% reduced database size. MSSQL introduced vardecimal compression in SQL 2005 but it its ability to compress was limited to certain objects (principally BW cubes benefitted the most).
With SQL 2008 this all changes. I'll talk you through the different types of compression available in Microsoft SQL Server 2008 and what benefits they brought our BW production system.
To start with, our productive environment was 92,372Mb. It's pretty well maintained and we try to keep the size down with archiving, data deletion and good maintenance and it was some 120Gb+ a year ago.
Note!!! If you are going to do any of this, then read SAP Note 991014 - Row Compression/Vardecimal on SQL Server. You do any of this at your own risk. Obviously don't do this in a productive system before proving it in a non-productive environment.
1) Row level compression
The word "Compression" is a misnomer and this doesn't do any compression. This mechanism only works on tables with numbers in them (decimal columns). MSSQL then stores decimal data not as fixed size fields, but just using the amount of space it requires. Specifically, the value "0" only requires 4-bits of space.
Note that on new installations of SAP, this is already the default option. You will see this in the analyse report below, if that is the case.
To do row compression make sure you download the latest version of the transports attached to Note 991014, then run ABAP report RSDD_MSSQL_CUBEANALYZE. Select the 100,000 largest tables and "Use Row Compression", and click "Start Checks". You can refresh the screen till its done - it should take 5 minutes, so go and get a cup of tea. When "CheckJobActiv" says no, it will show you a list of tables and their sizes. Click in the middle and select Ctrl-A to select all rows, then click "Compress Selected Table(s)". You may as well do all the tables, although the biggest benefit comes from the top 100.
This will take several hours on a small system and maybe much longer on a bigger system. It will also generate a lot of transaction logs, so you may want to change your transaction log style to BULK LOAD to reduce these. In our system I just ran it at a quiet time. On large productive systems you may need to run it in smaller chunks, during your support intervals.
So we expect row level compression to work well on some tables and not on others, and this is exactly what happens. There are 2 types of table that really benefit, the RS* tables (which contain information about logs) and BW cube/data store tables (which are largely numeric). On these tables we got 40-60% compression, e.g. 1068Mb down to 524Mb, 52%, for RSBATCHSTACK.
Tables like PSA tables and dictionary tables don't fare so well, for example 6037Mb to 5575Mb, 8%, for D010TAB
In total this reduced our database size from 92372Mb to 77208Mb, or a saving of 16%, without a net reduction in performance.
The neat thing about row compression is that it doesn't affect performance and this was born out in our productive environment - queries and data loads are no slower.
2) Page compression
Page compression is a whole different thing, and it takes common keys in tables e.g. Company Code/Client, and replaces them with tokens. Thus this will work on all tables. However it holds a CPU overhead of 20-30%. I happen to know that CPU on our VMware farm runs at 30% (but we are I/O bound), so a bigger reduction in database size for the tradeoff of some extra CPU cycles, is worth it for us. You have to decide this based on your situation. Clearly if you do this and you are already CPU bound, you will be in a worse sitation.
You have to enable Expert Mode by going to RSDD_MSSQL_CUBEANALYZE, then going to Settings -> Export Mode on/off. This will give you a new button "Page Compression". Compress the tables in the same way.
Note - tables which contain transient data, like qRFC tables, shouldn't be page compressed. Here's a few to get you started: VBDATA, VBHDR, VBMOD, ARFCSTATE, ARFCSDATA, TRFCQDATA, TRFCQIN, TRFCQINS, TRFCQOUT, TRFCQSTATE, ARFCRDATA, ARFCRSTATE - but you may have more in your system. Enabling Page Compression won't do any harm but will waste CPU cycles and not save any space on those tables.
It's clear immediately that the impact is massive with Page Compression. Some DSO tables are shrunk from 2884Mb to 269Mb (91%). Even the hard to compress tables like D010TAB are shrunk from 6037Mb to 4162Mb (31%).
In total we reduce database size from 93272Mb to 51061Mb (45%). Our largest dataloads still run in the same time that they did before, so the reduction in I/O has matched the increase in CPU cycles, as expected.
3) Index compression
What I noticed at this point was that the data was taking up a lot less space than the indexes. For example, on table D010TAB, it used to be 2480Mb for the data and 6037Mb total. Now it's 662Mb for data and 4162Mb overall. This means that whilst the data is compressed, indexes don't get any smaller and are a much larger part of the database size than before.
MSSQL 2008 supports index compression as well but it is not yet supported by SAP. The reasons for this are slightly unclear but it can apparently cause dumps during bulk operations like support package stacks, BW Change Runs etc. Expect a patch for this soon.
In the spirit of this, I decided to play with Index Compression in our Development environment which is a recent copy of Production. Go back to RSDD_MSSQL_CUBEANALYZE, click the Index Compression button and do another compression run. This warns you that this is unsupported...
The impact of this is once again massive. D010TAB goes down from 6072Mb to 1915Mb total (68% compression). RSBATCHSTACK down from 1009Mb to 55Mb (95% compression). Note the numbers are a bit different, I haven't done this in Prod!
Total database size is 27,647Mb down from the same 92,372Mb, which is an amazing 70% compression!
4) UCS2 compression (New in Microsoft SQL server 2010)
I've not done this yet but MSSQL 2010 has compression for Unicode data. This promises to halve the size of certain types of Unicode data sets. I'll update this blog when I get my hands on the copy of the RTM version of MSSQL 2010.
5) Umm...How do I get my disk space back?
So I now have set of systems that I've compressed from 90Gb to 50Gb. Unfortunately, the data files haven't shrunk and I have lots of free space, for much more than the growth I need. My filesystem is 160Gb, Database files are 90Gb and Data size is 50Gb! Remember, the purpose of this exercise was to liberate real disk space!
There are two options to do this, the first is to do an export/import and the second, to do a DBCC SHRINKFILE.
Purists don't like DBCC SHRINKFILE because it can cause fragmentation and confuse RAID controller read-ahead policies. Microsoft tell us (see the MSSQL blog) that the impact of this is much less than in days gone by, because they have improved their algorithms, and so have the RAID controller people and therefore recommend DBCC SHRINKFILE.
I'm quite interested because my database was SQL 2000, upgraded to 2005 and then 2008 over a 5 year period. This was one of the first BW 7.0 installations in the world and it's been through the wars. Soon it will be upgraded to 2010.
For me the question is whether an export/import will clean the DB up and reduce fragmentation in a meaningful way, thereby improving performance. So for now, I'm going to do the DBCC SHRINKFILE and cut the size of my DB files. At some later stage I'll do an export/import and we'll see if that reduces space further, and increases performance.
Microsoft SQL Server 2008 brings real benefits to the size of SAP databases without any material performance impact in our environment. In most SAP environments, which are I/O bound, this brings a win/win situation to SAP customers running Microsoft platforms. Get on and do it! I'm about to send a change request to our Basis team to do this to our entire SAP landscape and save 1.5Tb of disk space.