paint-brush
What Are Columnar Databases and When Do You Need to Use Them?by@andriiromasiun
221 reads

What Are Columnar Databases and When Do You Need to Use Them?

by Andrii RomasiunAugust 2nd, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Columnar (or column-oriented) databases store tables of data by column rather than by row. They're often used in online analytical processing (OLAP) environments. Columnar databases are perfect for processing analytical data and building aggregated reports. They are also commonly used in IoT applications and monitoring systems, which often deal with time-series data.
featured image - What Are Columnar Databases and When Do You Need to Use Them?
Andrii Romasiun HackerNoon profile picture

When building your application, one of the first questions you will probably ask yourself is: what database should I use to store my users' data? If you look up this question on the Internet, you will find not only that there are hundreds of different database management systems, but also that there are different categories of databases, such as SQL and NoSQL databases, key/value databases, row and column databases.


In this article, you will learn what columnar databases are, when to use them, and their advantages and disadvantages.

What Are Column-Oriented Databases?

Columnar (or column-oriented) databases store tables of data by column rather than by row. They're often used in online analytical processing (OLAP) environments.


For example, here's a simple table containing some information about usernames, their country, and their account status.

| ID |  Name  | Active |  Country  |
|----|--------|--------|-----------|
|  1 | Andrii |      1 | UK        |
|  2 | Tom    |      0 | US        |
|  3 | Jack   |      1 | Canada    |
|  4 | Scott  |      1 | Australia |


For columnar databases, the data would be stored as:

1234;AndriiTomJackScott;1011;UKUSCanadaAustralia;


While traditional row-oriented databases present such tables as:

1Andrii1UK;2Tom0US;3Jack1Canada;4Scott1Australia


Storing data differently drastically improves the performance of read operations. For example, performing aggregation functions such as averaging or summing will be many times faster on columnar databases.


At the same time, a row-oriented database would perform better for write operations. Especially when inserting rows one at a time. They are perfect for transactional processing (OLTP).


For example, consider a scenario where you want to create an analytical sales report. If you are using a row-oriented database, all the rows (chunks of data) have to be brought into memory and processed. Even if you select a particular column, the DBMS will first have to access a row such as 1Andrii1UK and then extract the required column from it.


A column-oriented database would present data in chunks of columns, like AndriiTomJackScott, allowing much faster processing and read operations.


Row-oriented databases prioritize transactional integrity and data normalization, while columnar databases focus on fast-read operations and data aggregation.

Use Cases for Columnar Databases

Columnar databases are perfect for reading and processing billions of data points. Here are some of the more common use cases:

Log analysis

Server logs are usually a pain to analyze, especially when there is a lot of information being logged from different processes. Columnar databases can store logs and later aggregate information such as the most common errors or their patterns.

Business Analysis & Time Series Data

Columnar databases are perfect for processing analytical data and building aggregated reports, and this is probably their most common use case. Columnar OLAPs are also commonly used in IoT applications and monitoring systems, which often deal with time-series data.

Examples of Columnar DBMSs

Many column-oriented databases share similarities, such as distributed processing, aggregation capabilities, and so on. However, many of them are designed for different purposes.

Clickhouse

Clickhouse is one of the most popular open-source real-time OLAP databases. Its speed and aggregation capabilities make it ideal for analytical purposes and it's used by organizations such as CERN, Akamai, and Bytedance.

DuckDB

DuckDB is an open-source in-process columnar DBMS. It also features fast reads and extensive aggregation functions, and has been described as the "SQLite of columnar databases." It’s perfect for applications dealing with logs or parsing CSV files. It’s also great for embedded programming, as it’s relatively small in size and fast.

QuestDB

QuestDB is also an open-source columnar database with a focus on efficient processing and aggregation of time series data.

Final Thoughts

There are dozens of different types of database management systems, and each one is tailored to specific use cases. Choosing the right database for your application is a good long-term solution that can save you the headache of migrating later.


For example, if you're building analytical or log-processing applications, it's a good idea to opt for column-oriented databases, even if you don't store that much data initially.