During the last year, we've been developing a complex semi-real-time system in production. We decided to write it with Golang. We had little to no experience in Go, so as you might imagine it was not trivial.
Fast forward a year: The system is running in production and became one of the major pillars in ClimaCell's offering.
To be proficient means that you have enough experience to know what are the pitfalls of the platform you are using and how to avoid them.
I want to describe three pitfalls we've encountered in our quest with Golang, in hopes that it will help you to avoid them out the gate.
Consider the following example:
package main
import (
"fmt"
"sync"
)
type A struct {
id int
}
func main() {
channel := make(chan A, 5)
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
for a := range channel {
wg.Add(1)
go func() {
defer wg.Done()
fmt.Println(a.id)
}()
}
}()
for i := 0; i < 10; i++ {
channel <- A{id:i}
}
close(channel)
wg.Wait()
}
We have a channel holding struct instances. We iterate over the channel with range
operator. What do you think will be the output of this piece of code?
6
6
6
6
6
9
9
9
9
9
Weird isn't it? We expected to see the numbers 1-9 (not ordered of course).
What we actually see is the result of mutability of the loop variable:
In every iteration, we get a struct instance to work with. Structs are value types - they are copied to the for iteration variable in each iteration. The keyword here is copy. To avoid large memory print, instead of creating a new instance of the variable in each iteration, a single instance is created at the beginning of the loop, and in each iteration the data is copied on it.
Closures are the other part of the equation: closure in Go (like most languages), holds a reference of the objects in the closure (not copying the data), so the inner go routine takes a reference of the iterated object, meaning the all the go routines get the same reference to the same instance.
First of all, be aware that this happens. It’s not trivial since its a completely different behavior from other languages ( for-each
in C#, for-of
in JS - in those the loop variable is immutable)
To avoid this pitfall, capture the variable within the scope of the loop, thus creating a new instance yourself, and then use it however you would like:
go func() {
defer wg.Done()
for a := range channel {
wg.Add(1)
go func(item A) {
defer wg.Done()
fmt.Println(item.id)
}(a) // Capture happens here
}
}()
Here, we use the function call of the inner go routine to capture a
- effectively copying it. It can also be copied explicitly:
for a := range channel {
wg.Add(1)
item := a // Capture happens here
go func() {
defer wg.Done()
fmt.Println(item.id)
}()
}
for-range
as an additional manifestation for arrays. It also creates an index loop variable. Note that the index loop variable is also mutable. I.e. to use it within a go routine, capture it that same way as you do with the value loop variableGoLang has two assignment operators, =
and :=
:
var num int
num = 3
name := "yossi"
:=
is pretty useful, allowing us to avoid variable declaration before assignment. Its actually a common practice in many typed languages today (like var
in C#). Its pretty handy and keeps the code cleaner (in my humble opinion).
But as lovely as this is, when combined with a couple of other behaviors in GoLang ,scope, and multiple return values, we can encounter unexpected behavior. Consider the following example:
package main
import (
"fmt"
)
func main() {
var data []string
data, err := getData()
if err != nil {
panic("ERROR!")
}
for _, item := range data {
fmt.Println(item)
}
}
func getData() ([]string, error) {
// Simulating getting the data from a datasource - lets say a DB.
return []string{"there","are","no","strings","on","me"}, nil
}
In this example we read an array of strings from somewhere and print it:
there
are
no
strings
on
me
Please note the usage of :=
:
data, err := getData()
Note that even though data
is already declared we can still use :=
since err
is not - nice shorthand which creates a much cleaner code.
Now lets modify the code a bit:
func main() {
var data []string
killswitch := os.Getenv("KILLSWITCH")
if killswitch == "" {
fmt.Println("kill switch is off")
data, err := getData()
if err != nil {
panic("ERROR!")
}
fmt.Printf("Data was fetched! %d\n", len(data))
}
for _, item := range data {
fmt.Println(item)
}
}
What do you think will be the result of this piece of code?
kill switch is off
Data was fetched! 6
Weird, isn't it? Since the kill switch is off, we DO load the data - we even print the length of it. So why the code doesn't print it like before?
You guessed it - because of :=
!
Scope in GoLang (like most modern languages) is defined with {}
. Here, this if
creates a new scope:
if killswitch == "" {
...
}
Because we use :=
, Go will treat both data
and err
as new variables! I.e. data
within the if clause is actually a new variable, which is discarded when the scope closes.
We've encountered such behavior several times on initialization flows - which usually expose some sort of package variable, initialized exactly as described here, with a kill switch to allow us to disable certain behaviors on production. The implementation above will cause an invalid state of the system.
Awareness - did I say it already? :)
In some cases, the Go compiler will issue a warning or even an error if the internal variable in the if
clause is not used for example:
if killswitch == "" {
fmt.Println("kill switch is off")
data, err := getData()
if err != nil {
panic("ERROR!")
}
}
// Will issue an error :
data declared but not used
So be aware of warnings upon compilation.
Nevertheless, sometimes we do actually use the variable within the scope, so an error will not be issued.
Anyway, the best course of action is to try to avoid :=
shorthand - especially when it is related to multiple return values and error handling, and keep extra attention when deciding to use it:
func main() {
var data []string
var err error // Declaring err to make sure we can use = instead of :=
killswitch := os.Getenv("KILLSWITCH")
if killswitch == "" {
fmt.Println("kill switch is off")
data, err = getData()
if err != nil {
panic("ERROR!")
}
fmt.Printf("Data was fetched! %d\n", len(data))
}
for _, item := range data {
fmt.Println(item)
}
}
Will result:
kill switch is off
Data was fetched! 6
there
are
no
strings
on
me
Remember, as the code evolves different developers will modify it. Code that was not in a different scope before might be in the future. Keep your eye out when you modify existing code, especially when moving it to a different scope.
Consider the following example:
package main
import (
"fmt"
"sync"
"time"
)
type A struct {
id int
}
func main() {
start := time.Now()
channel := make(chan A, 100)
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
for a := range channel {
process(a)
}
}()
for i := 0; i < 100; i++ {
channel <- A{id:i}
}
close(channel)
wg.Wait()
elapsed := time.Since(start)
fmt.Printf("Took %s\n", elapsed)
}
func process(a A) {
fmt.Printf("Start processing %v\n", a)
time.Sleep(100 * time.Millisecond)
fmt.Printf("Finish processing %v\n", a)
}
Same as before, we have a for-range
loop on a channel. Let’s say that the process
function contains an algorithm we need to run and is not very fast. If we process, let's say 100,000 items, the code above will run for almost three hours (process
runs 100ms in the example). So instead, let’s do this:
package main
import (
"fmt"
"sync"
"time"
)
type A struct {
id int
}
func main() {
start := time.Now()
channel := make(chan A, 100)
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
for a := range channel {
wg.Add(1)
go func(a A) {
defer wg.Done()
process(a)
}(a)
}
}()
for i := 0; i < 100; i++ {
channel <- A{id:i}
}
close(channel)
wg.Wait()
elapsed := time.Since(start)
fmt.Printf("Took %s\n", elapsed)
}
func process(a A) {
fmt.Printf("Start processing %v\n", a)
time.Sleep(100 * time.Millisecond)
fmt.Printf("Finish processing %v\n", a)
}
Instead of processing the items in serial, we dispatch a Go routine for each item in the channel. We want to utilize Go's amazing concurrency handling to help us process the data faster:
items |
Without Go routines |
With Go Routines |
---|---|---|
100 |
10s |
100ms |
|
|
|
Theoretically, this will also work for 100K items, right?
Unfortunately, the answer is "it depends".
To understand why, we need to understand what happens when we dispatch a go routine. I won't go into it in depth since it is out of scope for this article. In short, the runtime creates an object which contains all the data relevant to the go routine and stores it. When execution of the go routine is done, it is evicted. The minimal size of a go routine object is 2K, but it can reach up to 1GB (on 64 bit machine).
By now, you probably know where we are going - the more go routines we create, the more objects we create, hence memory consumption is increasing. Furthermore, go routines needs execution time from the CPU to do the actual execution, so the fewer cores we have, the more of these objects will remain in memory waiting for execution.
In low resource environments (Lambda functions, K8s pods with restricted limits), both CPU and memory are limited, the code sample will create stress on the memory even at 100K go routines (Again, depending on how much memory is available for the instance). In our case, on Cloud function with 128MB of memory we were able to process ~100K items before crashing.
Note that the actual data we need from an application point of view is pretty small - in this case, a simple int. Most of the memory consumption is the go routine itself.
Worker pools!
A worker pool allows us to manage the number of go routines we have, keeping memory print low. Lets see the same example with worker pool:
package main
import (
"fmt"
"sync"
"time"
)
type A struct {
id int
}
func main() {
start := time.Now()
workerPoolSize := 100
channel := make(chan A, 100)
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
for i := 0;i < workerPoolSize;i++ {
wg.Add(1)
go func() {
defer wg.Done()
for a := range channel {
process(a)
}
}()
}
}()
// Feeding the channel
for i := 0; i < 100000; i++ {
channel <- A{id:i}
}
close(channel)
wg.Wait()
elapsed := time.Since(start)
fmt.Printf("Took %s\n", elapsed)
}
func process(a A) {
fmt.Printf("Start processing %v\n", a)
time.Sleep(100 * time.Millisecond)
fmt.Printf("Finish processing %v\n", a)
}
We limited the number of worker pool to 100 and for each one created a go routine:
go func() {
defer wg.Done()
for i := 0;i < workerPoolSize;i++ {
wg.Add(1)
go func() { // Go routine per worker
defer wg.Done()
for a := range channel {
process(a)
}
}()
}
}()
Think of the channel as a queue, where each worker go routine is a consumer of the queue. Go's channels allow multiple go routines to listen to the same channel, where each item in the channel will be processed once.
We can now plan our environment as the memory print now is expected and can be measured:
the size of the worker pool * expected size of a single go routine (min 2K)
Execution time will increase. When we limit memory usage we pay for it with increased execution time. Why? previously we dispatched a go routine per item to process - effective creating consumer per item. Virtually gives us infinite scale and high concurrency.
In reality, it’s not correct as the execution of the go routines depends on the availability of the cores running the application. What it does mean is that we'll have to optimize the number of workers according to the platform we run on, but it makes sense to do so in high-volume systems.
Worker pools give us more control on the execution of our code. They allow us predictability so we can plan and optimize both our code and the platform to scale to high throughput and high volume of data.
I would recommend always using worker pools in cases where the application needs to iterate on sets of data - even small ones. Using worker pools we are now able to process millions of items on Cloud functions without even close to the limits the platform enables, giving us enough breathing room to scale.
What makes us better professionals is the ability to learn from our mistakes. But learning from someone else's can be equally important.
If you reach this far - Thanks!
I hope what we've seen here will help you, dear reader, to avoid the mistakes we've made on our journey with GoLang
Also published at https://blog.house-of-code.com