We continue to see the common debate of SQL vs. NoSQL and other database comparisons all over social media and platforms like Hackernoon. In most cases, it’s not that one database is better than the other, it’s that one is a better fit for a specific use case due to numerous factors.
Last year, our CTO Kyle Bernhardy, led an awesome talk titled A Deep Dive Into Database Architectures. You can watch this talk at the link, but since this is such a prominent discussion topic we thought it might be helpful to summarize. This article will provide an overview on database architectures, including use cases and pros & cons for each of them.
Let’s start with general considerations when selecting a database. It’s important to understand things such as data type / structure, data volume, consistency, write & read frequency, hosting, cost, security, and integration constraints. The more you know about these factors, the easier it will be to pick the right database for your project.
You may already know that there are generally 3 database hosting options:
On-Premises
Cloud Hosted
Database-as-a-Service (DBaaS)
Now for the part you’ve been waiting for - database architectures.
We’ll start with the most commonly used. Relational (SQL) Databases such as Oracle, MySQL, PostgreSQL, Microsoft SQL Server, and SQLite, organize data into tables with columns, each with a specified name and datatype. Additionally:
On the plus side, relational databases use mature technology that is widely understood and well-documented, SQL standards are well-defined, defined constraints enforce data integrity, they avoid data duplication and are highly secure and ACID-compliant. However on the negative side, SQL databases cannot handle unstructured or semi-structured data, their tables don’t necessarily map to objects, they require complicated ETL (Extract, Transform, Load) and maintenance, have row locking, and pricing for some products (Oracle, SAP) are out of reach for developers and some organizations.
Note: While some RDBMS systems can now handle JSON, they are not purpose built to do so.
There are several awesome use cases for relational databases; situations where data integrity is absolutely paramount (financial applications, defense and security, private health information), highly structured data, and automation of internal processes.
Relational databases are the most common database in production today, but they were not designed for the scale and agility of modern applications. About 10 years ago the NoSQL movement caught on to address these concerns and changed the database landscape forever. Note:
While databases are typically categorized as SQL or NoSQL, there are many intricacies to NoSQL databases. Let’s get into it.
Key-value stores are often used as the underlying storage for higher level databases. For example, MongoDB uses a key value store called WiredTiger as their default storage engine. Key-Value Stores, such as Redis, DynamoDB, and Cosmos DB, are:
For pros, key value stores provide fast, low-complexity access to data, are flexible, and can scale quickly and cheaply. However, they have extremely limited functionality, cannot handle complex structures or query or search by anything other than key, do not scale well as data models grow, and they require more programming overhead for complex implementations.
Example use cases for key value stores would be embedded systems, URL shorteners, configuration data, application variables and flags for web applications, state information, and data represented by a dictionary or hash.
Document stores, such as MongoDB, DynamoDB, Couchbase, and Firebase, are similar to key-value stores, but the value is a document.
The pros of document stores include flexibility and scalability, schemaless, fast writes, ideal for semi-structured and unstructured data, and developers do not need to know data structure ahead of time / it can change overtime without downtime. The cons are that they are not ACID compliant, limited to querying within a document, relationships/cross references are not enforced, slow searching, cannot join documents/collections in a single query, lack of database enforcement requires developer discipline and vigilance for application level enforcement, and they typically result in data duplication.
Great use cases for document stores are unstructured or semi-structured data, content management, rapid prototyping, and collecting of high traffic data.
Graph databases, such as Neo4j, OrientDB, and TitanDB, are ideal for when relationships or connections are top priority.
On the plus side, graph databases have advanced features for relationship querying, traversing, and tracking, are optimized for querying related data, and they avoid row locking. As for the negatives, graph databases have a large ramp up time for developers, high overhead for simple use cases, lack of standardization, poor performance of aggregate queries, and devs typically need to learn a custom query language.
Graph databases are great for analysis of heterogeneous data points, fraud prevention, advanced enterprise operations, social networking, payment systems, and GeoSpatial routing/visualization.
We’re almost there! Next up is time series databases, such as InfluxDB, Kdb+, and Prometheus, which are:
On the positive side, time series databases are designed for dealing with linear data over time, can handle high ingestion rates, have built-in features specifically for dealing with time-based data, a schema optimized for time-series arrays, and batch delete features. As far as negatives, time series databases only deal with time-series data, do not support full SQL, their read speed suffers compared to writes, they have no transaction capability and are append-only (not optimized for updates).
Great use cases for time series databases are managing infrastructure, IoT sensor collection, and log monitoring and alerting.
Last but not least is search engines, such as Elasticsearch, Splunk and Apache Solr, which are:
Search engine pros include their focus on optimized searching, highly scalable and schemaless, and they have advanced search options like full text search, suggestions, and complex search operations. The cons are that they are expensive, have low durability and poor security, have no transaction support, are not efficient for writing and retrieving data outside of searching, and are difficult to manage.
Search engines are great when search results are top priority, logging, product catalogs, and blogs.
It should be noted that many real life implementations choose “polyglot persistence,” which is the concept of using different data storage technologies to handle different data storage needs within a given software application. Many database technologies implement this within a single software, referred to as multi-model or data lake technology. This can be ideal for specific use cases, but poses inherent risk such as instability, data inconsistency and corruption, expensive / resource intensive, and data replication on disk and memory.
A few examples of multi-model technologies are:
It probably makes sense to discuss where HarperDB fits into the mix.
If for no other reason than to highlight that these database “categories” are not all black and white, and some databases take more of a “hybrid” approach by subscribing to several methodologies. Instead of falling into a single bucket, HarperDB can be considered as a structured object store with SQL capabilities. It features:
We built HarperDB from the ground up to expand and blend the best capabilities of SQL, NewSQL, and NoSQL products because we felt there were certain use cases that could be better served with another solution. We believe that providing developers with the ability to choose the right tool for the job empowers developers and spurs innovation. Some examples where we feel that HarperDB is a better fit include cases where you need both NoSQL & SQL, rapid application development, hybrid cloud, integration, edge computing, distributed computing, and real-time operational analytics. Essentially, the SQL vs. NoSQL debate becomes irrelevant with HarperDB because you no longer have to choose!
Hopefully this database architecture overview is helpful in finding the right database for your use case. Please reply below with questions or comments - we would love to discuss!
Also published at https://dev.to/harperdb/database-architectures-use-cases-explained-5711