paint-brush
A Comprehensive Guide for Using DuckDB With Goby@olontsev
6,511 reads
6,511 reads

A Comprehensive Guide for Using DuckDB With Go

by Sergey OlontsevJuly 13th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

DuckDB is an embedded SQL database engine designed for OLAP-style workloads. With its ability to handle high-speed analytics in a memory-centric environment, it has quickly gained popularity among data scientists and analysts. In this blog post, we will explore using DuckDB with Go.

People Mentioned

Mention Thumbnail
featured image - A Comprehensive Guide for Using DuckDB With Go
Sergey Olontsev HackerNoon profile picture


DuckDB is an embedded SQL database engine. It's very similar to well-known SQLite but is designed for OLAP-style workloads. DuckDB supports a variety of data types and SQL features. With its ability to handle high-speed analytics in a memory-centric environment, it has quickly gained popularity among data scientists and analysts. In this blog post, we will explore using DuckDB with Go.


OLAP Databases

Online Analytical Processing (OLAP) databases are specially designed to support complex analytical and ad-hoc queries, data manipulation, and multidimensional analysis. These databases are optimized for read-heavy workloads, unlike Online Transaction Processing (OLTP) databases which are more suited for write-heavy, transactional operations. OLAP databases are the backbone of many Business Intelligence (BI) tools, allowing decision-makers to analyze data from multiple dimensions and perspectives.


OLAP databases are highly efficient at handling operations like data summarization, complex calculations, trend analysis, and data modeling. They enable users to drill down into datasets to view data from different time periods, categories, or other dimensions, offering valuable insights and supporting strategic decisions. DuckDB is designed to support OLAP-style workloads efficiently.


Embedded Databases

Embedded databases are specialized database management systems (DBMS) that run within the same process as the application. Unlike standalone database servers, which run as separate processes and require inter-process communication, embedded databases interact directly with the application, thereby offering significant performance benefits and simplifying application architecture. They are typically lightweight, use less resources, and require minimal setup and administration. Examples of popular embedded databases include SQLite, RocksDB, and DuckDB.


An embedded database is particularly useful in applications that need to manage data but don't require the full functionality of a standalone DBMS. These could range from desktop applications and small web applications to IoT devices and mobile apps. They are also utilized in server applications for tasks such as caching, temporary data manipulation, or when the data to be stored is application-specific and does not need to be shared across different services.


Major Benefits of DuckDB

  1. In-Memory Execution: DuckDB primarily operates in-memory, but also supports out-of-memory execution. This allows it to perform computations very quickly and efficiently.


  2. Full SQL Support: DuckDB supports a wide range of SQL features, which makes it very flexible for various types of data operations.


  3. Transaction Support: DuckDB supports transactions, which is a critical feature for maintaining data integrity and consistency in many applications.


  4. Vectorized Execution: DuckDB uses vectorized query execution, which leads to better CPU utilization and improved performance.


  5. Easy Integration: DuckDB provides APIs for multiple programming languages, including Python, R, C++, Rust, Java, and Go. This makes it easier to integrate DuckDB into existing workflows and systems.


  6. Open Source: DuckDB is open source, which means its source code is freely available for modification or enhancement. This allows for community-driven improvements and adaptability to specific use cases.


Getting Started

Before you can start using DuckDB with Go, you need to install the DuckDB Go driver. You can download it using Go's package manager. Run the following command in your terminal:

go get github.com/marcboeker/go-duckdb


Connecting to DuckDB from Go

With the driver installed, you can now establish a connection to DuckDB from your Go application. Here's a basic example:

package main


import (
  "database/sql"
  "log"

  _ "github.com/marcboeker/go-duckdb"
)


func main() {
  // Empty datasource means, that DB will be solely in-memory, otherwise you could specify a filename here
  db, err := sql.Open("duckdb", "")
  if err != nil {
    log.Fatal("Failed to connect to database:", err)
  }
  defer db.Close()
}

The sql.Open() function is used to connect to DuckDB, and the empty data source name denotes that we are using an in-memory database.


Create Table and Insert Data

Now that the connection has been established, you can perform various database operations. Here's a simple way to create a table:

// Create table
_, err = db.Exec(`
  CREATE TABLE employee (
    id INTEGER,
    name VARCHAR(20),
    start_dt TIMESTAMP,
    is_remote BOOLEAN
  )`)
if err != nil {
  log.Fatal(err)
}


Next, to insert data into the employee table, you could use the Exec function with an INSERT INTO SQL command:

// Insert some data in table
_, err = db.Exec(`
  INSERT INTO employee (id, name, start_dt, is_remote)
  VALUES
    (1, 'John Doe', '2022-01-01 09:00:00', true),
    (2, 'Jane Smith', '2023-03-15 10:00:00', false)`)
if err != nil {
  log.Fatal(err)
}


When dealing with larger sets of data, consider using transactions and prepared statements for efficiency and safety. Remember, always handle errors and close your connections when done.


Query Single or Multiple Rows

To fetch data, you can use the QueryRow() function to select a single row:

// Variables to store query result
var id int
var name string
var startDt time.Time
var isRemote bool


// Query single row
if err = db.QueryRow("SELECT id, name, start_dt, is_remote FROM employee WHERE id = ?", 1).Scan(&id, &name, &startDt, &isRemote); err != nil {
  if err == sql.ErrNoRows {
    log.Println("No rows found.")
  } else {
    log.Fatalf("unable to execute query: %v", err)
  }
} else {
  fmt.Println("Select 1 row result:\nID:", id, "Name:", name, "Start Datetime:", startDt, "Is Remote:", isRemote)
}


Don't forget to handle any errors and properly close your connections and result sets, as shown above.


To select multiple rows, you can use Query() function:

// Query multiple rows
rows, err := db.Query("SELECT id, name, start_dt, is_remote FROM employee")
if err != nil {
  log.Fatal(err)
}
defer rows.Close()


// Print the results
fmt.Println("Results:")
for rows.Next() {
  err = rows.Scan(&id, &name, &startDt, &isRemote)
  if err != nil {
    log.Fatal(err)
  }
  fmt.Println("ID:", id, "Name:", name, "Start Datetime:", startDt, "Is Remote:", isRemote)
}

err = rows.Err()
if err != nil {
  log.Fatal(err)
}


In this code:


  • We're calling the Query() function with an SQL command to select all records from the employee table.

  • We then enter a loop using rows.Next(), which iterates over each row returned by the query.

  • Within the loop, we're using the Scan() function to copy the columns from the current row into the id, name, workFrom, and isRemote variables.

  • We then print these variables using the fmt.Println() function.

  • After the loop, we check for errors during the iteration with rows.Err(). If there's an error, we print it using log.Fatal(err).


Error Handling and Transactions

In the real world, your Go code must be ready to handle errors and work with transactions. The SQL package provides all the necessary tools:


// Error handling and transactions
tx, err := db.Begin()
if err != nil {
  log.Fatal(err)
}
defer tx.Rollback()


_, err = tx.Exec(`
  INSERT INTO employee (id, name, start_dt, is_remote)
  VALUES
    (3000000000, 'id int64 instead of int32', '2022-06-17 11:00:00', true)`)
if err != nil {
  log.Printf("ERROR: %s\n", err.Error()) // Do not fail, just print the error in output
}

err = tx.Commit()
if err != nil {
  log.Fatal(err)
}


This code begins a transaction, tries to execute an insert statement, and then commits the transaction. If an error occurs during execution, it will rollback any changes made in that transaction.

Conclusion

DuckDB's support for Go allows developers to perform robust data analytics operations directly from their Go applications. This integration between a powerful data management system and a versatile, efficient programming language opens the door to more advanced data processing applications. With the fundamental knowledge provided in this article, you can start exploring these possibilities.


The full source code for this example could be found on my GitHub repo: https://github.com/solontsev/go-cookbook/tree/main/duckdb