The premise of my article is to dive into the architecture of databases and learn why it has worked well for enterprises. While comparing the resiliency of databases with the value proposition of permissioned chains. Finally a look ahead of how ledger database if architected correctly could be poised to ride the next wave of innovation. Jon Choi argued that we should retire the word blockchain and use the term “open networks”. The conclusion from Jon’s article is that open crypto networks, “are networks that share resources and responsibilities with their participants”. By establishing the term this would allow the community to evolve and grow independently from other communities. I would argue that the enterprise community should retire the word blockchain and adopt the term “ledger databases”.
There is a cost of developing resources, designing the framework, governance and proving out the concept. For example, IBM and Maersk (shipping company) established a consortium to allow shipping companies to automate custom clearance, trade finance, and document transfers within an established shipping consortium. Within a few months of launching Maersk has had trouble onboarding shipping companies because they felt threatened by Maersk operating the consortium.
This is just one example of an enterprise project that has lost steam within the past twelve months. In a recent Coindesk article, Jerry Cuomo from IBM stated, “If you start off small and centralized, the challenge will be getting the next big trust anchors on board. On the other hand, with the decentralized approach, you can have multiple competitors and their lawyers all asking questions and it’s going to take some time. You’ve got to pick your poison,”
Databases have been around since the ’70s and can manage millions of transactions per second. There are thousands of developers, architects, and managers that have built database infrastructures and have made a living for decades. Why would they switch and learn something new when most solutions work well today? Blockchains promise the ability to decentralize trust, privacy, and store tamper-resistant data while creating new profitable workflows. Unfortunately, that promise and the combination of media hype has propelled this as a must use technology. The question is whether blockchain is ready for mass adoption or not. Gartner has reported that we are currently still in the peak of inflated expectations of the technology or what we call hype. (Remember VR twenty years ago?)
I would compare the euphoria of blockchain akin to Samuel Brannan from the Gold Rush Era but with a twist. Samuel Brannan heard rumors of gold deposits in Coloma in 1847. After confirming rumors, he went into town and yelled famously, “Gold! Gold! Gold!.” In the next few months, three-quarters of the male population headed for the mines. Samuel in the meantime made an incredible amount of wealth by selling mining equipment to those miners. The miners purchased the equipment from Samuel and that hype netted them with the ability to mine gold and sell it. Enterprises are in a similar situation but the equipment isn’t ready to use yet.
In the next four sections, I will review the architecture and use cases for Relational Databases, Non-Relational Databases, Permissioned Chains, and Ledger Databases. For readers who are already familiar with the technicals, you can skip down to the Technical Summary and Conclusion.
There are four ways to store data within a Database Management System; Hierarchical, Network, Relational and Object Oriented. The most used way to store data within Enterprise is Relational. The Relational Model was proposed by E.F Codd in 1970s, fundamentally, he laid out the groundwork for structuring data using relations. For example, all data must be stored in tables that have rows and columns. A relation of a row and column would be a unique identifier known as a tuple. Finally, relations to other tables with unique identifiers are known as keys.
Relational Databases Management System’s (RDBMS) work well when the data that you are storing is structured in a way that can incorporate a table with rows, columns and field names. For example, a table should have a specific type of information such as name, phone number and other labels. In order to design a relational database, database architects use a standard programming language called Structured Query Language or SQL. SQL would allow architects and programmers to query, insert, update, and modify data. Most RDBMS use SQL as a standard language across all relational databases such as MySQL, Oracle, MS SQL Server, and Azure.
**Transactions & Consensus**A transaction is a logical unit of work that takes SQL written tasks into a workflow. The outcome of all SQL statements or transactions could be committed (added) or rolled back (subtracted). For example, let’s take a bank employee that transfers $500 from a savings account to a checking account.
Now let’s say that you have several nodes throughout the world that would need to commit the above transaction into a database. How can you ensure that all of the nodes are communicating and committing the final transaction? More importantly, how do nodes navigate through failures? How does a database guarantee the validity of a transaction?
The answer is XA Standard, ACID, and the 2PC Protocol.
The Open Group proposed XA Standard which is a standard that unifies transactions of multiple nodes and ensures that they are keeping up with the validity by using the ACID properties. ACID refers to Atomicity, Consistency, Isolation, Durability and is a set of properties that guarantee that either the transaction is valid or not.
Atomicity — All or Nothing
Consistency- Correct results are committed
Isolation — events within a transaction are hidden
Durability — Committed results must be guaranteed.
The host provides a transaction manager that is responsible for creating, managing and enforcing the transactions using ACID. For the nodes to communicate and agree to process the transaction, the 2PC protocol is a consensus algorithm that allows the nodes to either commit the transaction or abort due to node failure. Typically, nodes could commit in a short time period which can range from millisecond to minutes.
Within the 2PC protocol, the coordinator is the master node that ensures that all nodes are in synchronization when committing a task to the transaction log. The 2PC protocol automatically assumes trust between the nodes so easier to commit.
The way a transaction gets committed is as followed:
Step 1: Each server that is participating in a transaction would commit that they would be involved in a transaction. It gives the coordinator an idea of all of the participants within this transaction.
Step 2: Once the coordinator understands who the participants are, the coordinator then sends a signal to each participant with instructions to commit. After committing, each participant has to write on the log and confirm with the coordinator. If a participant fails, then the coordinating server would send a message to all of the participants to roll back the transaction. After the participants roll back they send a final message to the coordinator that is has been confirmed.
Non-Relational Database works differently. Database management systems are much smarter, it does not need an actual schema to store data. If you review the image below, data for RDBMS would have to be structured in a table format with known attributes so that the database system can understand and link data. With Non-Relational Database you can store data as a single document file. This type of databases is great for storing large unstructured data by using tools like Hadoop.
Diagram of how data is stored. Relational vs Non-Relational
Non-Relational Databases have been developed to counter non-relational databases lack of scalability in storage, processing, and analytics. In order for these features to work well, ACID properties have to be relaxed. Some of the areas of improvement over RDBMS are as follows: simpler to design, cost to deploy, easier to scale horizontally, finer controls over the availability of data, and speed. Non-Relational Databases are mostly used within web applications or big data projects due to the amount of unstructured data it stores. Non-relational databases make it easier for enterprises that are looking for scalability but are not looking for consistency of that data. So most non-relational databases will not adhere to the ACID model and largely support web applications.
Modularity There are a number of NoSQL databases such as Cosmos DB, Apache Cassandra, Hadoop Distributed File System, LevelDB, Couchbase and Datomic that offer or could offer immutable, append-only data stores as an enterprise offering. On the other hand, MongoDB 4.0 offers a distributed database with a consensus protocol. It has also baked in support for ACID-compliant transactions. Which means that the integrity of that data is compliant but still not as robust as a non-relational database. However, Mongo DB 4.0 was announced at the end of 2018 so curious to see enterprise adoption.
The difference between a permissioned blockchain versus a permissionless blockchain is the control access layer. The control access layer allows enterprises to control decentralization, anonymity, and governance. There are many implementations of permissioned blockchains that are out today such as Corda, Hyperledger Fabric, and Quorum that could offer security, immutability, and trust amongst network participants. Based on GitHub activity I will be discussing Hyperledger Fabric.
Hyperledger Fabric A typical deployment of a Hyperledger Fabric system would consist of nodes, a smart contract (Chaincode) that executes business logic and a ledger that maintains transaction log and the world state. The world state is a non-relational database that would track and update latest changes made on the blockchain so large organizations can query information without having to search the transaction log. The blockchain itself is an immutable append only ledger that logs all of the transactions blocks by block. Unlike the world state, the blockchain cannot update, edit, change or roll back transactions. More specifically, the world state is implemented as a database which offers enterprises a familiar means to querying, aggregating or traversing large amounts of data. The Hyperledger Fabric is compatible with both CouchDB and LevelDB which are NoSQL databases.
The ledger consists of the blockchain and a traditional database
Hyperledger Fabric’s blockchain looks and feels like a traditional blockchain, it contains a sequence of transactions that are linked for continuity. The block headers itself includes hashes of the blocks transactions. This keeps the blocks tamper-resistant and secure, so if a node acts up the other nodes would have the same copy. The data structure for the blockchain would be stored as a document and not as a database schema.
**Transactions & Consensus**Permission-ed blockchains do not have to use computing based mining to reach a consensus since nodes are known entities. They can use consensus algorithms like Paxos, RAFT or other PBFTs that could reach consensus and be deterministic. The Hyperledger Smart Contract is written in chain code which is a program written in Go, Node.Js, or Java. Chaincode runs independently from the endorsing peers so that it improves transaction throughput and offers granular controls within privacy.
The network nodes are operating by those that are invited into the private network and consist of client, peer and orderer nodes. The orderer nodes are controlled by a pluggable consensus algorithm that would allow the organization to choose a number of algorithms that would work well for their organization. This modularity could incorporate both Byzantine fault tolerant and crash fault-tolerant consensus algorithm.
The consensus in Hyperledger Fabric is ultimately done in 4 phases:
Microsoft and Amazon offer a fully managed blockchain solution to enterprises that are looking to deploy Ethereum, Hyperledger Fabric networks. They offer a fully managed system that eliminates the need for manually setting up hardware, software and ongoing security. Quantum Ledger Database (QLDB) has been recently announced and takes a unique approach by combining the familiarity of a database while implementing blockchain like features such as immutability.
**Quantum Ledger Database**QLDB is a non-relational database that has abstracted important features of a blockchain s. Most enterprises will argue that by rearchitecting a relational database for the purposes of auditability / immutability is inefficient and complex. QLDB’s value proposition for enterprises is to take away the complexities of managing blockchain networks and to minimize infrastructure costs, finality times and data propagation. Finally, most enterprises already use either AWS or Azure, so a product must offer an easy way to integrate with current infrastructure.
So what is QLDB exactly? “QLDB is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log owned by a central trusted authority. QLDB tracks each and every application data change and maintain a complete and verifiable history of changes over time.”
The architecture for QLDB is an append-only journal that stores all data in sequential order and cannot be changed. It also offers the ability to see all historical changes made within the database like a transaction log. The historical changes are cryptographically secured using SHA256. With SQL support enterprises can leverage current SQL developers to offer robust ways to query and manage data. Since it is a non-relational database, it has the ability to store large swaths of semi-unstructured data using a document-oriented data model. Finally, from a scalability and consistency perspective, QLDB implements the ACID properties so it keeps the transaction valid and secure. One feature the QLDB does not offer is the ability to share consensus, so you do not have a consensus algorithm or validation process between nodes.
Relational databases dominated enterprises because its easy, robust, performant and flexible. Relational databases scale well on a single server, but when scaling to multiple servers it becomes harder. Relational Databases have been widely adopted within enterprises because it adheres to the ACID model which prefers data integrity over scalability. Relational databases are foolproof and enforces transaction execution incredibly well. Like any model, if you want scalability, you would have to sacrifice the availability (see CAP theorem) of the data. Relational Databases fall short when it comes scaling especially millions of read/writes per second.
CAP Theorem suggest you could only choose between availability and consistency
Non-Relational Databases are mostly used within web applications or big data projects due to the amount of unstructured data it stores. Non-relational databases make it easier for enterprises that are looking for scalability but are not looking for consistency (see CAP theorem) of that data. So most non-relational databases will not adhere to the ACID model and largely support web applications. Non-relational databases could only support certain types of transactions and are not as granular as relational databases. With the recent announcement of MongoDB 4.0, it offers a distributed database with a consensus protocol which is also ACID compliant and serverless.
Permissioned Blockchain is a controlled environment that offers the ability to coordinate trust amongst others. It has a control access layer that would designate network participants, governance and the operating model. Permissioned Chains can be good for audits, creating a trusted governance model with competitors and executing business logic. Permission-ed Blockchain’s do offer integrations with NoSQL databases so enterprises could use SQL to query data. Enterprises have to navigate the complexities of cost, understanding consensus, privacy, bureaucracies and establishing trust with potential competitors. Enterprises could be embroiled in a multi-year project that could include armies of lawyers, capital and lack of trust.
Quantum Ledger Database (QLDB) is a NoSQL database that provides an immutable, transparent, and cryptographically verifiable transaction log owned by a central authority. QLDB has taken away features such as consensus algorithms, blockchain networks and sophistication of deploying networks on multiple nodes. Enterprises that are attracted to blockchain but do not want to deal with the governance can deploy and manage a database easily. It also supports SQL so it would allow architects, database administrators, and most enterprises to easily integrate within there own infrastructure. QLDB lacks decentralization so a central authority is needed and can not invite multiple parties to create consortium like networks.
Databases started with the premise that enterprises needed the ability to store large amounts of structured data. This gave way to relational databases that enabled enterprises to better understand the data and create insights. With the rise of the internet, Non-Relational Databases allowed enterprises to collect large unstructured data and try to predict insights with tools like machine learning and predictive analytics. Permissioned Blockchains offer the ability to allow enterprises that do not trust each other to work together but the data has to be portable. The enterprise ecosystem has to mature and understand that newer workflows require exchanging company data and trust. Paired with the complexities of permissioned blockchains the market does not demand this type of workflow today.
Gartner conducted a poll in March 2018 with 300 Enterprises
With the invention of QLDB and MongoDB 4.0, I believe we are on the right path to offering enterprises a way to take advantage of ledger like technology without the complexities. I am not denouncing enterprise blockchain but merely suggesting a new path forward. The entire technology community has labeled blockchain as a technology that could revolutionize public key infrastructures, enterprises, consumer businesses, and other areas. Unfortunately, by standing behind the word blockchain, we have stifled growth in many areas and pitted communities against each other.
In the next few years, I could see a future where enterprises deploy shared ledger databases that can support open networks such as Ethereum, Bitcoin, and others. Further, I could see zk-snarks and confidential/shielded transactions making it easier for enterprises to operate in the open network. Fundamentally, we are still incredibly early in the space and do believe that we have to wait for the ecosystem to mature.
_Special thanks to Mohamad Fouda, Soona Amhaz, Mohamad El Seidy_Link to Token Daily Article
Links and Resouces: Microsoft Azure, Blockchain at Berkley, IEEE, HBR Big Data, Cisco, Ycombinator, Nexthink, Blockgeeks, Decentralize, Devteam, Techopedia, Oracle, Guide to ACID, DBMS Guide, University of Montana,Research Gate, Databases and Blockchain, BlockchainHub, Enterprise Storage Forum, Blockchains Don’t Suck, Stock Over Flow, Quorum, Apache Cassandra, Altoros, Amazon, Research Gate, Chain Code Tutorial, Hyperledge Consensus , Hyperledger Whitepaper, KAFKA, Coindesk: Lost Faith in Private Blockchain, Coindesk: IBM & Maersk, PBS Samuel Brannan, Coindesk: Amazon Plays Own Game, Jamserra RDBMS vs NRDBMS