The Suitability of Algebraix Data’s Technology to Cloud Computing
Robin Bloor, Ph D
© Copyright 2011, The Bloor Group All rights reserved. Neither this publication nor any part of it may be reproduced or transmitted or stored in any form or by any means, without either the prior written permission of the copyright holder or the issue of a licenseby the copyright holder. The Bloor Group is the sole copyright holder of this publication.
❏ 22214 Oban Drive ❏ Spicewood TX 78669 ❏Tel: 512-524-3689 ❏
Email contact: firstname.lastname@example.org www.TheVirtualCircle.com www.BloorGroup.com
WHAT IS A CLOUD DATABASE?
This white paper was commissioned by Algebraix Data. The goal of the paper is to provide adeﬁnition of what a cloud database is, and in the light of that deﬁnition, examine the suitability of Algebraix Data’s technology to fulﬁll the role of a cloud database. Here is a brief summary of the contents of this paper: • We deﬁne a cloud dbms (CDBMS) to be a distributed database that can deliver a query service across multiple distributed database nodes located in multiple data centers, includingcloud data centers. Querying distributed data sources is precisely the problem that businesses will encounter as cloud computing grows in popularity. Such a database also needs to deliver high availability and cater for disaster recovery. In our view, a CDBMS only needs to provide a query service. SOA already delivers connectivity and integration for transactional systems, so we see no need for aCDBMS to cater for transactional trafﬁc - only query trafﬁc. A CDBMS needs to scale across large computer grids, but it also needs to be able to span multiple data centers and, as far as is possible, cater for slow network connections. We review traditional databases, focusing primarily on relational databases and column store databases, concluding that such databases, as currently engineered, couldnot fulﬁll the role of a CDBMS. They have centralized architectures and such architectures would encounter a scalability limit at some point, both within and between data centers. We conclude that a distributed peer-to-peer architecture is needed to satisfy the CDBMS characteristics that we have deﬁned. We move on to examine the Hadoop/MapReduce environment and its suitability as a CDBMS. It hasmuch better scalability for many workloads than relational or column store databases, because of its distributed architecture. However it was not built for mixed workloads or for complex data structures or even for multitasking. In its current form it emphasizes “fault tolerance.” It succeeds as a database for very large volumes of data, but does not have the characteristics of a CDBMS. Finally, weexamine Algebraix Data’s technology as implemented in its database product A2DB. Our conclusion is that it has an architecture which is suitable for deployment as a CDBMS. Our view is as follows: A2DB’s unique capability to reuse intermediate results of queries that it has previously executed, contribute to it delivering high performance at a single node. The same performance characteristics canbe employed to speed up queries that join information between a local node and remote nodes, whether in the same data center or in a remote data center. Algebraix Data’s technology is capable of global optimization, balancing the performance requirements of both global and local queries. Additionally the technology can deliver high availability/fault tolerant operation.
•We are aware that Algebraix Data has not been deployed and tested its database A2DB in the role of CDBMS hence our conclusion is not that it qualiﬁes as a CDBMS, but that it has an architecture that would enable it to be tested in this role.
WHAT IS A CLOUD DATABASE?
The Cloud Database - In Concept
Cloud computing is a major driving trend for IT. Over 36 percent of US companies...