Mainstream Business Applications and In-Memory Databases
Databases serving business applications are heading towards memory-centric design and implementation
By: Ranko Mosic
Jan. 2, 2013 05:00 AM
Contemporary large servers are routinely configured with 2TB of RAM. It is thus possible to fit an entire average size OLTP database in memory directly accessible by CPU. There is a long history of academic research on how to best utilize relatively abundant computer memory. This research is becoming increasingly relevant as databases serving business applications are heading towards memory centric design and implementation.
If you simply place Oracle RDBMS's files on Solid State Disk, or configure buffer cache (SGA) large enough to contain the whole database, Oracle will not magically become an IMDB database, nor it will perform much faster. In order to properly utilize memory, IMDB databases require purposely architected, configured, balanced and optimized hardware (CPU, RAM, Flash, busses, cluster interconnect), as well as RDBMS software written with RAM as center point in mind. IMDB's main premise is that data primarily resides in RAM and is only persisted to disk for protection and auxiliary functions. Data structures, methods and processes typically associated with an RDBMS (index types, join methods, data layout, internal data flows and processing) need to be (re)designed with RAM-centric axiom in mind.
There are a few fully functional IMDB products available on the market today. SAP Hana is one of the most recent additions to IMDB class of products.
SAP Hana is a hybrid row/columnar data store. It is designed under the assumption that the most of operations ( either OLTP or OLAP ) are reading ( as opposed to writing ) the data. Hana aspires to serve equally well both OLTP and OLAP workloads since it considers them similar in terms of workload characteristics.
Hana's row oriented tables are always in memory, while column tables are loaded to memory on demand. Column store data is dictionary encoded and compressed and thus more expensive (in terms of system resources) for inserts and updates than column store. In order to handle relatively demanding writes to column store, Hana's database memory is logically split into main store and delta store. Main store is optimized for reads and efficient memory consumption. Writes to columnar data are handled via delta storage which has basic compression level and is optimized for write. Cache Sensitive B+ tree (CSB+) used for faster search on delta. Delta and main store are merged automatically or manually. The need for delta store effectively halves size of database that Hana can handle since delta store must equal size of the main store.
While SAP Hana column store performance is in some cases is spectacular (aggregations over small number of columns), reconstruction of large rows is fairly slow because of the need to reconstruct rows.
It is possible to build clusters with dozens of nodes (SAP recently tested 100 node, 100 TB RAM cluster), where data is distributed across nodes.
In Hana's clustered, distributed database, all data is not directly accessible by a single CPU (only local RAM is directly addressable) which has negative performance implications.
There are other memory centric databases available on the market today - IBM Solid DB, Oracle Times Ten, Volt DB. We expect that major vendors like Oracle further modify their mainstream RDBMS to adjust to increased role of RAM and other types of memory in modern hardware.
Latest AJAXWorld RIA Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
SYS-CON Featured Whitepapers
Most Read This Week