In-Memory Highlights
- Faster databases improve business
- In-memory databases help support near real-time processes
- Choices include
- TP monitor v relational database
- SQL v noSQL programming
- Column v Row axes
- Scale up, scale out or massively parallel appliance hardware
- Workloads are typically OLTP, OLAP and tied-application
- SAP HANA, IBM DB2 BLU, Oracle 12g, Windows SQL Server 2014 all have in-memory aspects
I’m thankful to IBM Switzerland for inviting me to speak at its in-memory conference this week, which has got me thinking about the subject. You’ll want to know more about how database types can be defined and why in-memory developments are a big thing.
The Business Prerogative – Near Real Time Processes
As a Dbase user in the 1980s it used to take several hours run the market share programme I had developed. Then along came Fox Base – a compatible in-memory version that sped thing up 10 fold.
Because of the large amount of data they contain Enterprise databases (unlike applications, which always run in memory) constantly fetch information back and forth from storage. With improvements in technology and lower prices you can speed things up by shifting more of the data into memory, or adopting programmes with different structures.
The prerogative is to run your processes faster. If you can react and make business decisions in real-time (or thereabouts) you can be a stronger competitor and be more productive of course.
Today there are 100s of different databases for Enterprises to choose from and they all have their pros, cons and best-fit applications – whether OLTP, OLAP or other database-tied workload.
The introduction of SAP’s HANA database is driving a strong debate around in-memory databases. We’re going to be focusing on the subject in detail over coming months and wanted to start by outlining the various types of database available today.
TP Monitors v Relational Databases
Once upon a time mainframes and TP monitors ruled the world: then relational databases threatened to take over. In the event both sides won: IBM CICS and IMS maintained its position in banks and financial institutions, while Oracle established itself in other large enterprises and Microsoft in SMBs. IBM’s DB2 allowed it to address other mainframe, Unix, Linux and Windows platforms.
IBM’s Edgar Codd invented relational databases management systems in 1970. They are made up of a number of tables connected by the primary key, which is used to identify each row. They are programmed by Structured Query Language (SQL), which provides both data definition and manipulation languages.
Example products include Actian Ingres, HP Vertica, IBM DB2 and Informix, Microsoft Access and SQL Server, Oracle 12g and MySQL and SAP Sybase.
Users also have to consider the differences between centralised databases (where storage is connected to a single CPU) and distributed ones that run on loosely-couples systems which have no common physical components in common.
A criticism of relational databases is that they can be slow by nature of their design and of TP monitors, that they are expensive.
SQL v noSQL Databases
NoSQL databases were designed to overcome the programming constraints of traditional RDBMS programming, speeding throughput and lowering latency as a result. Example products include Apache CouchDB, MongoDB and Oracle’s (imaginatively named) ‘NoSQL Database’. Types of NoSQL databases include Column, Document, Key-value (see our write up of Riak) and Graph.
A criticism of these types of database is that they are difficult to programme and require technical wizards for each project.
Columns And Rows
Another way of improving performance is to store data in columns (rather than in the conventional rows of an RDBMS), allowing fields with identical values to be compressed into one, with pointers to all rows which contain it.
Example column databases include IBM DB2 BLU, SAP HANA, Sybase and HP Vertica. Hbase, the database used in Hadoop, is also a columnar database.
Columnar databases can be criticised for changing the data on disk and their complexity, since the data is usually held in both row and column versions, making updating problematic in some cases.
Massively Parallel OLAP Appliances
For On Line Analytic Processing (OLAP) and data warehousing workloads a number of vendors have combined processing, storage and database in appliances. These systems tend to use massively parallel processing and feature Field Programmable Gate Arrays for speeding processing time. Examples include Netazza (now IBM PureData for Analytics and DB2 Analytics Accelerator), Teradata, Oracle Exadata, Exalogic and Exalytics and EMC GreenPlum.
The advantages are that they hide many of the complexities of designing and implementing complex databases from the customer, allowing them to be owned and run by non-IT departments. However these systems can be very expensive in comparison with other ways of building data warehouses, especially if the quantity of data to be processed is over-estimated.
Databases For OLTP, OLAP And Applications
Databases can be used for many different functions. It can be useful to split them into a number of different usage cases. In particular:
- OLTP – tending to handle structured data, often on centralised, ‘scale-up’ hardware
- OLAP – tending to handle unstructured data, either on massively parallel or clustered hardware
- Applications – databases are often integrated with applications (Oracle and SAP Business Suites for instance) and sometimes difficult to differentiate from them
The selection of the most appropriate database type for the size and type of workload is difficult and – as you can see – there are many choices.
Some Conclusions – In Search Of Balanced Systems
In-memory database development is dependent on a number of factors – the cost of DRAM, the maximum amount of memory you can put in the server as well as the development of the software itself. However it’s not a matter of just stuffing your server full of memory, any more than it makes sense putting a F1 engine in a mini. SAP has created a massive focus on in-memory databases with HANA, although it’s still early days for them – we’re in the first generation of deployments and it only certifies appliances based on Intel’s E7 chip. Nevertheless HANA’s introduction has put database choice firmly back on the agenda for many enterprises.
The 3 largest database suppliers have all made their own in-memory announcements (DB2 BLU, Oracle 11g, Microsoft SQL Server 2014) and will have to bite their lips as the newcomer gets attention for features they’ve included for years. The stakes are high – all of them supply databases to be used along side SAP applications of course and IBM also sells its own HANA appliances.
There’s a lot of profit and revenue to be made from developing the database software, fitting it to balanced systems and making it appropriate for specific workloads. Our prediction – the in-memory database debate will grow beyond its current focus on HANA, have strong effects on flash, disk and tape markets and be far more relevant to customers than the current ‘Big Data’ hype. In-memory techniques will spread to virtualisation as well – one vendor is proposing in-memory VDI for instance, It’s going to get interesting…
3 Responses to “What’s The Beef About In-Memory Databases?”
Read below or add a comment...