First Impression and A Little Story When Trying Aerospike in Kurio
aerospike logo from google images search
Last week we develop a new feature in Kurio. This feature is big enough for me because there are only two of us from Backend Team + one from Infra-team, was assigned to finish this project. So we need to find a Database application that suited to our cases that are:
- Supportâââhas an official library or easy to integrateâââwith Golang, because our current project already running on Golang
- We want a fast read and write database, either is RDBMS or NoSQL. Which mean, while we write the data, it does not affect the read performances.
- The database must easy to scale.
- And the data was persistent and saved to disk.
If drawn in a picture, we have something similar like this:
Our System Schema
After figuring the cases, we list a few options of databases.
Redis
We know Redis is very good, and fast because it saves the data in memory. But we know itâs not suitable for our case. Redis will persist the data to disk if there was any trigger. As our case need, we need the data stored in the disk and persisted plus fast read.
MongoDB
MongoDB comes to our second option because, in our current system, we use MongoDB as the data store. But we need more performances.
MyRocks
Another option that we started to think is MyRocks. MyRocks introduced by Facebook, using MySql with RocksDB as the storage engine. Because it was used by Facebook, we think it was a better option. But later, after discussing with our infrastructure team, MyRocks same as other MySql, it cannot scale out. The difference is only the storage engine, so thereâs no big difference compared to another normal Mysql in term of âscaling outâ.
Aerospike
Later, I found many databases that have great performance out there, something like Cassandra, Scylla, and etc. I donât remember many of them. Until I found Aerospike. It was like a rising star database. Also, there were many article benchmark about it on the Internet. Well, to be honest, Aerospike is something new to me, and also for the team in Kurio, but after seeing all the review, and the feature of Aerospike, and quite fit to our cases, we decide to try Aerospike. So, after discussing with the team, also with the Infra-team, we decide to use Aerospike, thanks to its scalability, the infra team does not need extra effort for maintaining Aerospike in scaling out.
First Impression
This is a few features of Aerospike that amazed us the team and also fit our case.
Redis-way
After learning the concept and how the data saved in Aerospike, I learn that Aerospike has a similar concept with Redis. It's using key:value
concept.
Talking about performance in retrieving data, of course, it same with redis. It was a key:value
anyway.
Secondary Index
Another thing, I learned from Aerospike is, they supported secondary index. So, even the aerospike was a key:value
, it also possible to us to query using another index that we created.
Asynchronously Persisted
Not like Redis, Aerospike persists the data asynchronously to disk. If redis persist the data by trigger or action, Aerospike can persist the data to disk automatically, because in Aerospike we can use the Hybrid data storage. It will save to memory and disk.
Data Model and Schema
In Aerospike, there a few terms related to data that must we know first. They are :
- Namespace
- Set
- Record
- Bin
Aerospike Data Schema
Namespaces
Namespaces are top-level of the container. The namespace contains one or more Set, Records, Bins, Index. If we compare to RDBMS, namespace
is similar to a Database Schema.
Namespace image from Aerospike documentations
Sets
Set is more similar like a collection in MongoDB, or a table in RDBMS. It contains many records and bins.
Set in Aerospike
Records
Records are more similar like rows in RDBMS. One record has one PK (key
), and have one or many bins. And in one set/collection
, it may have many records.
Record in Aerospike
Bins
Bin in Aerospike
Bins in Aerospike is more similar like a column in RDBMS. We can add the index to any bin as any RDBMS does. The different is, itâs more flexible and dynamic. It can have a lot of bins in one record. And for a single bin, itâs can store any data type ( Int, String, Byte, etc). Itâs more like the column but more flexible.
Example of Bins
More about this already explained well in the official documentation here: https://www.aerospike.com/docs/architecture/data-model.html. So I will not tell much about this four here.
Querying and Indexing
So, after developing the feature (which using aerospike as the data store), we must and had to learn how to query in Aerospike.
Luckily, Aerospike already creates many client library and support for many programming language. We can see in their official GitHub account here http://github.com/aerospike. Also to help in debugging control data, they also create the aql
(Aerospike Query Language). It provides a SQL-like command line interface for database, UDF (User Defined Function) and index management.
With the aql
, we can do a query to the Aerospike server like :
$ aql> SELECT * FROM test.user$ aql> SELECT * FROM test.user WHERE PK=2
More about command and information about aql
you can read it here: https://www.aerospike.com/docs/tools/aql
For our case, because we use golang in our project, we use the official client created by Aerospike here: https://github.com/aerospike/aerospike-client-go
Indexing
As we know, Aerospike is a key:value
data storage. But, aerospike is also support for the seconday index. Thatâs mean, we also add an index on the value/bin
. Then, with that index, we can query to the value. So itâs not just a like get the data by key, but also we can get the data by value, or indexed bin.
For example, let say I have User set, that has bins: user_id,name,email
. For this example, I will make the user_id
be the PK. So in total, for one record, I will have minimum of 2 bins.
Example of User
With this record, I can directly query or get Record by PK. If using aql
it just like this command:
$ aql> SELECT * FROM sample.user WHERE PK=12
Another case, letâs say I want to query by email. I want to get user by email [email protected]
 . If using aql
it will more like this.
# Add Index on email bin$ aql> CREATE INDEX email_user_idx ON sample.user (email) STRING# Query by Email$ aql> SELECT * FROM sample.user WHERE email="[email protected]"# Will Display the result|-----|---------------|------------------|| PK | name | email || 14 | Iman Ganteng | [email protected]||----------------------------------------|
More about this querying and indexing you can read in the official documentation.
Deploying To Production
Well, back to our story, if you want know more about Aerospike you can read in the official documentations in their website.
After finishing all the feature and environment, we trying to deploy it to production.
We deploy it at midnight, around 11.00 PM till 11.59 PM, and we just leave it until in the morning to gather the data.
But at morning 06.00 AM, our CPU usage going high and spike. And unfortunately, we must rollback the service to the stable version.
Detecting and Fixing Issues
So, after trying to release it to the production, we get some critical issue. When the request is high, our CPU usage is going abnormal than the old version. Well to be honest, in this new version feature, it has many computation process, than the previous version. Also, we donât implement the autoscaling mechanism yet. So we assume it was because our added function that cause the CPU usage going high. But until we trying to profiling our application, we get unexpected case. From profiling we can see that the client library has slow process and quite a lot of CPU usage.
Profiling golang using pprof. Show the CPU usage in client library aerospike
More usage caused by Syscall.
But later, after looking for the slide presentation by the CTO of Aerospike here: https://www.slideshare.net/brian-aerospike/go-meetup-nov142, and also after looking all the pprof images, we can see that this is happen by the network I/O. So, to fix the issue, we implement the autoscaling mechanism to our system.
Conclusion
So, after trying the Aerospike, it is quite challenging. Because it was new for us. We are just two person to doing this, three with one extra of the Infra-team member. And from my own perspective, Aerospike is a worth to try for them to seek a data store like our cases. Redis-like but persisted (Hybrid: memory and disk). And also support for secondary Index.
Talking about the drawback, I found some drawback, it was in the library golang itself, not the Aerospike. The library return the data in map[string]interface{}
. I wish someone out there will submit a PR to the repository, so it will allow the client-library return only bytes
when querying results, so we can handle the marshalling by ourself. LOL đ
Well, maybe there was a few things that I missed, but I hope I can write it well. And by the way, I write this based on my perspective and opinion and also my own experience when trying the Aerospike directly.
If you think this story worth enough to read, share it to your circle, so your friend can also read this. Or if you have a question or another perception or if I write something wrong, just put a response below, or you can email me. Thank you