When building your application, one of the first questions you will probably ask yourself is: what database should I use to store my users' data? If you look up this question on the Internet, you will find not only that there are hundreds of different database management systems, but also that there are different categories of databases, such as SQL and NoSQL databases, key/value databases, row and column databases.
In this article, you will learn what columnar databases are, when to use them, and their advantages and disadvantages.
Columnar (or column-oriented) databases store tables of data by column rather than by row. They're often used in online analytical processing (OLAP) environments.
For example, here's a simple table containing some information about usernames, their country, and their account status.
| ID | Name | Active | Country |
|----|--------|--------|-----------|
| 1 | Andrii | 1 | UK |
| 2 | Tom | 0 | US |
| 3 | Jack | 1 | Canada |
| 4 | Scott | 1 | Australia |
For columnar databases, the data would be stored as:
1234;AndriiTomJackScott;1011;UKUSCanadaAustralia;
While traditional row-oriented databases present such tables as:
1Andrii1UK;2Tom0US;3Jack1Canada;4Scott1Australia
Storing data differently drastically improves the performance of read operations. For example, performing aggregation functions such as averaging or summing will be many times faster on columnar databases.
At the same time, a row-oriented database would perform better for write operations. Especially when inserting rows one at a time. They are perfect for transactional processing (OLTP).
For example, consider a scenario where you want to create an analytical sales report. If you are using a row-oriented database, all the rows (chunks of data) have to be brought into memory and processed. Even if you select a particular column, the DBMS will first have to access a row such as 1Andrii1UK
and then extract the required column from it.
A column-oriented database would present data in chunks of columns, like AndriiTomJackScott
, allowing much faster processing and read operations.
Row-oriented databases prioritize transactional integrity and data normalization, while columnar databases focus on fast-read operations and data aggregation.
Columnar databases are perfect for reading and processing billions of data points. Here are some of the more common use cases:
Server logs are usually a pain to analyze, especially when there is a lot of information being logged from different processes. Columnar databases can store logs and later aggregate information such as the most common errors or their patterns.
Columnar databases are perfect for processing analytical data and building aggregated reports, and this is probably their most common use case. Columnar OLAPs are also commonly used in IoT applications and monitoring systems, which often deal with time-series data.
Many column-oriented databases share similarities, such as distributed processing, aggregation capabilities, and so on. However, many of them are designed for different purposes.
Clickhouse is one of the most popular open-source real-time OLAP databases. Its speed and aggregation capabilities make it ideal for analytical purposes and it's used by organizations such as CERN, Akamai, and Bytedance.
DuckDB is an open-source in-process columnar DBMS. It also features fast reads and extensive aggregation functions, and has been described as the "SQLite of columnar databases." It’s perfect for applications dealing with logs or parsing CSV files. It’s also great for embedded programming, as it’s relatively small in size and fast.
QuestDB is also an open-source columnar database with a focus on efficient processing and aggregation of time series data.
There are dozens of different types of database management systems, and each one is tailored to specific use cases. Choosing the right database for your application is a good long-term solution that can save you the headache of migrating later.
For example, if you're building analytical or log-processing applications, it's a good idea to opt for column-oriented databases, even if you don't store that much data initially.