paint-brush
Mastering Maps in Go: Everything You Need to Knowby@smokfyz
7,405 reads
7,405 reads

Mastering Maps in Go: Everything You Need to Know

by Ivan SharapenkovApril 27th, 2024
Read on Terminal Reader

Too Long; Didn't Read

This article covers the essentials of using maps in Go programming, from basic associative arrays to advanced hash maps with collision handling strategies. It delves into the implementation of maps in Go, how to manage concurrency with sync.Map, and provides practical examples to illustrate map operations and their complexities. The discussion aims to equip you with the knowledge to effectively use maps in various programming scenarios, enhancing both efficiency and performance.
featured image - Mastering Maps in Go: Everything You Need to Know
Ivan Sharapenkov HackerNoon profile picture

A map (also known as an associative array, hash table, hash map, hash set, dict) is an essential data structure for solving different algorithmic problems during day-to-day work, in programming contests, or during job interviews. Therefore, it’s essential to understand which features maps provide to you as a developer: the operations supported by them, the time and space complexity of different operations, and how to use this structure in code.


Using this knowledge and some practical experience, you will be able to detect cases when you need to apply a map to solve your problem. The article you are currently reading provides a comprehensive guide to the usage of maps. We will start with a brief overview of the map data structure in general and then deep dive into the implementation of maps in the Go programming language and discuss strategies for using maps in concurrent code.

Map for Beginners

Associative Array

Let's start by discussing an important concept that lies behind all data structures implemented in various programming languages: the abstract data type. This is a mathematical model used to describe a data type in terms of the operations supported by the data type, the behavior of these operations, and the possible values that can be passed to or received from them. There is also an abstract data type for a map, known as an associative array.


An associative array is an abstract data type that is used to store a collection of key and value pairs in such a way that each key appears only once in the collection, or in other words, there is only one value for each key in the collection. It supports get, set, and delete operations.


Let's imagine that we have an implementation of an associative array that can store key and value pairs, where both key and value are strings, in the Go programming language, and look at what the operations look like.


package main

import "fmt"

// type AssociativeArray implementation somewhere here

func main() {
    // Initialize a new associative array. associativeArray := NewAssociativeArray()
    // Set the value for the key "Alice".
    associativeArray.Set("Alice", "apple")

    // Attempt to retrieve and print the value associated with "Alice".
    valueForAlice, found := associativeArray.Get("Alice")
    if found {
    	fmt.Println(valueForAlice) // Output: "apple"
    } else {
    	fmt.Println("No value found for Alice.")
    }

    // Delete the entry associated with "Alice".
    associativeArray.Delete("Alice")

    // Attempt to retrieve the value for "Alice" after deletion to check error handling.
    valueForAlice, found = associativeArray.Get("Alice")
    if !found {
    	fmt.Println("No value found for Alice. (Deleted)")
    }

    // Attempt to retrieve the value for "Bob" which was never set.
    valueForBob, found := associativeArray.Get("Bob")
    if !found {
    	fmt.Println("No value found for Bob. (Never set)")
    }
}


Setting a key that has been set before is a valid operation, but in this case, a new value will be associated with the old key, and it will be impossible to retrieve the previous value. It could be deleted or shadowed, depending on the implementation of the associative array.


package main

import ( "fmt" )

// type AssociativeArray implementation somewhere here

func main() {
    // Initialize a new associative array. associativeArray := NewAssociativeArray()
    // Set value associated with "Alice" key
    associativeArray.Set("Alice", "apple")

    // Get value associated with "Alice" key
    valueForAlice := associativeArray.Get("Alice")
    fmt.Println(valueForAlice)  // This line will print "apple"

    // Set new value associated with "Alice" key
    associativeArray.Set("Alice", "orange")

    // Get new value associated with "Alice" key
    valueForAlice = associativeArray.Get("Alice")
    fmt.Println(valueForAlice)  // This line will print "orange"
}


That's great, but you may ask how to implement an associative array in real life. Actually, there are no limits to implementation. You can implement it in any way you can imagine; you only need to support the required operations for this abstract data type. However, there are several approaches that are most common: the hash map and the binary search tree.


The differences between them lie in their time and space complexities and, of course, the algorithms for storing and retrieving keys and values. Moving forward, we will concentrate on hash map implementations, but it's worth noting that a binary search tree can also be used to implement an associative array.

Hash Map

We found out that a hash map is a data structure that implements associative array operations: setting a value for a key, retrieving a value by key, and deleting a value associated with a key. But how does a hash map actually work under the hood? Let's find out.


The core idea here is hidden behind the word "hash." A hash map uses a hash function to compute the index of the place in the array where the value will be stored by passing the key to the hash function. Let’s try to implement a simple hash map using the Go programming language to see how the hash function and array operate together to build a hash map.


package main

import "fmt"

func main() {
    // Define an array to hold string data array := make([]string, 10)
    // Hash function that returns an integer based on the key and array length
    hash := func(key string, length int) int {
        calculatedHash := 0
        for _, char := range key {
            calculatedHash += int(char)
        }
        return calculatedHash % length
    }

    // Set operation: store a value associated with a key
    index := hash("Alice", len(array))
    array[index] = "apple"

    // Get operation: retrieve the value associated with a key
    index = hash("Alice", len(array))
    fmt.Println(array[index]) // Should print "apple"

    // Delete operation: remove the value associated with a key
    index = hash("Alice", len(array))
    array[index] = "" // Setting it to empty string, assuming nil is not an option

    // Check if the deletion was successful
    index = hash("Alice", len(array))
    if array[index] == "" {
        fmt.Println("Value deleted successfully.")
    } else {
        fmt.Println("Value still exists:", array[index])
    }
}


That's actually a simplified way of how a real hash map works. You may notice that our implementation might break in some cases. If you didn't notice it, take a little time to think about potential issues before continuing to read.

Collisions

Did you manage to figure out what the problem was in the previous implementation? It's a situation where two different strings can yield the same hash. Exactly, we have only len(array) possible hash values but an infinite number of different keys, and by the pigeonhole principle, we can find two different keys that will generate the same hash, and this means that they will correspond to the same position in the array.


What will we do with it? We need to find some way to handle this situation. Fortunately, smart people have already solved this problem and implemented several well-known solutions, so let's just use them instead of reinventing the wheel.


  • Separate chaining. In this approach, instead of storing values directly in the array, we will store a linked list in the array for each index. When we add a new value to the map, we will first search to see if there is an item with the same key. If there is, then just update the value; otherwise, add a new item to the linked list.


  • Open addressing. Open addressing is another approach to collision resolution. Now, we will store a key and value pair in each array position. If we try to add a new value and encounter a situation where there is already a value in this position, we start using probing.


    Probing can be linear, quadratic, by using another hashing function, or even random. In linear probing, you start to increment the index by one until you find free space in your underlying array. The same process happens when you try to retrieve a value from the array and face a situation where the key in the array position doesn’t correspond to the searching key.


Let's look at the implementation of the separate chaining approach because it's very similar to the approach implemented in Go. But before examining the schema, let's see how our hash map will look.


Hash map with separate chaining


package main

import (
    "fmt"
    "hash/fnv"
)

// Entry represents a key-value pair in the linked list.
type Entry struct {
    Key   string
    Value string
    Next  *Entry
}

// HashMap represents the hash table structure.
type HashMap struct {
    Buckets []*Entry
    Size    int
}

// NewHashMap creates a new hash map with a given size.
func NewHashMap(size int) *HashMap {
    return &HashMap{
        Buckets: make([]*Entry, size),
        Size:    size,
    }
}

// HashFunction computes the bucket index for a given key.
func (h *HashMap) HashFunction(key string) int {
    hasher := fnv.New32()
    hasher.Write([]byte(key))
    return int(hasher.Sum32()) % h.Size
}

// Insert adds a new key-value pair to the hash map.
func (h *HashMap) Set(key, value string) {
    index := h.HashFunction(key)
    entry := &Entry{Key: key, Value: value}
    if h.Buckets[index] == nil {
        h.Buckets[index] = entry
    } else {
        current := h.Buckets[index]
        for current.Next != nil {
            current = current.Next
        }
        current.Next = entry
    }
}

// Search finds the value for a given key in the hash map.
func (h *HashMap) Get(key string) (string, bool) {
    index := h.HashFunction(key)
    current := h.Buckets[index]
    for current != nil {
        if current.Key == key {
            return current.Value, true
        }
        current = current.Next
    }
    return "", false
}

// Delete removes an entry from the hash map based on the key.
func (h *HashMap) Delete(key string) {
    index := h.HashFunction(key)
    if h.Buckets[index] != nil {
        if h.Buckets[index].Key == key {
            h.Buckets[index] = h.Buckets[index].Next
        } else {
            current := h.Buckets[index]
            for current.Next != nil {
                if current.Next.Key == key {
                    current.Next = current.Next.Next
                    break
                }
                current = current.Next
            }
        }
    }
}

func main() {
    hm := NewHashMap(10)
    hm.Set("name", "John")
    hm.Set("age", "30")
    value, exists := hm.Get("name")
    if exists {
        fmt.Println("Found:", value)
    } else {
        fmt.Println("Not found")
    }
    hm.Delete("name")
    value, exists = hm.Get("name")
    if exists {
        fmt.Println("Found:", value)
    } else {
        fmt.Println("Not found")
    }
}

// OUTPUT:
// Found: John
// Not found

Complexity

This could be an advanced topic for people not familiar with the concept of algorithmic complexity. If you recognize yourself in this description, then follow this link and learn about this concept.


Time and space complexity are really important when we reason about algorithms and data structures because they directly affect the performance of our code. Let's try to figure out the complexity of the implementation provided in the previous chapter.


To add a new key-value pair to a hash map, we need to calculate the hash, find the linked list located at the index, and pass through the linked list to find our key. The first thing that may affect time complexity is the key size; we assume that the average key size is small and the hash function's time complexity depends on the key size linearly. In this case, we can assume that the average time complexity for calculating the key hash is O(1).


Next, we need to pass through all items in the linked list. The number of items depends on how many collisions occur, and in the worst case, it will be O(n) when all items correspond to the same hash. So, the total time complexity for the add operation is O(1) + O(n) = O(n) both worst and average.


We can reduce the average time complexity by implementing resizing of the underlying array based on a heuristic called the load factor. It's simply the number of keys in the hash map divided by the number of slots in the underlying array. If this value exceeds a certain threshold, we could resize the underlying array and copy all values to the new array. After that, every linked list will be limited in size on average.


In this case, we can find a load factor value in such a way that the average time complexity will be O(1) for most cases. We won't go into the calculation here, but it's important to understand that.


Thus, the hash map provides the following time and space complexities constraints:


Time complexity: O(1) on average for all operations.

Space complexity: O(n) because we store one linked list entity for each key.

Map in Go

Armed with knowledge about the theory of implementing a hash map, let’s look at what built-in type of maps the Go programming language provides for us. There are several ways to create a map in Go:


package main

func main() {
    // Using the make Function: This is the most common way to create a map.
    // You specify the type of the keys and values.
    m1 := make(map[string]int) // Create a map with string keys and integer values.

    // You can also optionally set the size of the underlying array using a second argument if
    // you know how many items you will store in the map before creation.
    m2 := make(map[string]int, 10)

    // Using Map Literals: This method is similar to array or slice literals and is
    // useful for initializing a map with some values.
    m3 := map[string]int{"one": 1, "two": 2}  // Creates and initializes a map.

    // Nil Map: A map can also be declared without initialization. Such a map
    // is nil and has no keys, nor can it be added to.
    var m4 map[string]int  // m4 is nil and you cannot add keys to it without initializing it first.

    // To add keys to a nil map, it must first be initialized using the make function.
    m4 = make(map[string]int)  // Now m4 is initialized and ready for use.

    // Correcting a previously incorrect comment about m3.
    // m3 is not nil and assigning a new key-value pair will not cause a panic.
    m3["hello"] = 1
}


The Go built-in map type also implements all three operations required by an associative array and an additional feature for iterating over map items:


package main

import "fmt"

func main() {
    m := make(map[string]int)
    
    // Adding or updating an element.
    m["one"] = 1

    // Retrieving an element and checking if a key exists.
    value, ok := m["four"]
    if ok {
        fmt.Println("Value exists in map:", value)
    } else {
        fmt.Println("Value doesn't exist.")
    }

    // Deleting an element.
    delete(m, "one")

    // You can iterate over a map using a for loop along with the range keyword.
    // This gives you access to each key and value in the map.
    // You shouldn't rely on the order of items; even if you run the for loop several times
    // in sequence, you can get a different order of items. It's a property of hash tables
    // in general. Items in the hash table do not have a particular order.
    for key, value := range m {
        fmt.Println(key, value)
    }
}


Any type that is comparable can be a key in Go Maps. Comparable types include boolean, numeric, string, pointer, channel, and interface types, as well as structs or arrays that only contain comparable types. Also, there are virtually no restrictions on the types that can be used as values in a map. They can be anything from simple types like integers and strings to complex types like slices, other maps, or even functions.

Deep Dive Into Go Maps

Now, we will examine the implementation of maps by exploring the map source code. This will help us better understand how maps are implemented internally. The source code for Go's map can be found here.


A map in Go is an implementation of the hash map discussed earlier. In our previous example, we used a linked list for collision resolution. Go uses a different approach; instead of a linked list, there are buckets - other subarrays at each underlying array index. So essentially, the underlying array is a two-dimensional subarray. Each bucket contains up to 8 key-value pairs. If more than 8 keys hash to a bucket, we chain on extra buckets to exit one - overflow bucket.


The low-order bits of the hash are used to select a bucket. Each bucket contains a few high-order bits of each hash to distinguish the entries within a single bucket. When the map grows, we allocate a new array of buckets twice as big. Buckets are incrementally copied from the old bucket array to the new bucket array.


The main structures representing the map are called hmap and bmap. We skip some helper fields in the struct for simplicity. They don't matter for understanding the algorithm's idea:


package main

import (
    "unsafe"
    "internal/abi"
)

// hmap is the structure of a hashmap.
type hmap struct {
    count      int            // Number of live cells; size of the map. Must be first (used by len() builtin).
    buckets    unsafe.Pointer // Pointer to an array of 2^B Buckets; may be nil if count == 0.
    oldbuckets unsafe.Pointer // Previous bucket array of half the size, non-nil only when growing.
    // Add more fields if needed...
}

// bmap is a bucket used in a Go map.
type bmap struct {
    // Top byte of the hash value for each key in this bucket.
    // If tophash[0] < minTopHash, tophash[0] is a bucket evacuation state.
    tophash [abi.MapBucketCount]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // Followed by an overflow pointer.
}


count is the number of items currently stored in the map. buckets is the underlying array where we store new items, and oldbuckets is the array where we stored items before the map was resized. Instead of moving map items at the time of resizing, Go does it gradually.


buckets and oldbuckets are pointers that point to the bmap struct, which stores tophash - an array of the first byte of hash for each key stored in the bucket - followed by MapBucketCount keys and MapBucketCount values, with an overflow pointer at the end.


MapBucketCount is an architecture-specific value, no more than 8 according to the documentation. To better understand how a map looks in memory, here is a picture.


Representation of a map in memory


To find the value associated with a key, we first need to calculate the hash, which we will use to determine the bucket where our key is located. For this, Go uses different hash functions depending on the architecture and the type that needs to be hashed. After calculating the hash, Go uses several last bits to calculate the index where our key could be located.


Go also stores the first byte of the hash in the tophash array, and using the key’s hash, you can easily locate the possible key index in the bucket using tophash.


After that, we also need to compare the searched key with the key stored in the bucket because there can also be collisions between the first byte of the hash. This process is represented in the picture.


Process of finding value using key

Concurrency: Map vs. sync.Map

Concurrent Map

We have considered the internal implementation of a map, but Go is a language designed to develop highly concurrent code. How will we work with a map in cases where two or more goroutines concurrently perform operations on the map? Let’s consider two cases: two goroutines running concurrently read some values from a previously initialized map and when goroutines perform write operations.


In the case of two goroutines running concurrently to read some values from a shared map, we don’t face issues because our internal state of the map isn’t changing during the read operation. Therefore, we can safely read concurrently.


package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup
    m := make(map[string]int)
    m["a"] = 1
    m["b"] = 2

    // Function to read values from the map based on a key.
    readMap := func(key string) {
        value, ok := m[key]
        if ok {
            fmt.Println("Read:", key, value)
        } else {
            fmt.Println("Key not found:", key)
        }
        wg.Done()
    }

    // Adding two tasks to the WaitGroup before starting the goroutines.
    wg.Add(2)
    go readMap("a")
    go readMap("b")

    // Wait for all goroutines to complete.
    wg.Wait()
}


On the other hand, when we perform concurrent writes, the map can change state during the write operation. As we already examined in the previous section, the write operation isn’t atomic. This means that there can be other operations between steps of map modification, and if we try to read or write during another write process, we can face a map in an incorrect state. Go is smart enough to detect if you try to perform concurrent writes to a map and will throw a fatal error: concurrent map writes.


To handle this issue, we need to use a Mutex. A Mutex is a synchronization primitive that prevents the state from being modified or accessed by multiple threads of execution at once. It supports two operations, lock and unlock, which are executed atomically and therefore are safe.


Actually, we can use RWMutex, which allows locking for reading by several readers but only one for writes. This can help us optimize performance in cases where there are a lot of reads. Let’s look at the implementation for a concurrently safe map.


package main

import (
    "fmt"
    "sync"
)

// ConcurrentMap wraps a Go map with a sync.RWMutex to manage concurrent access with optimized read performance.
type ConcurrentMap struct {
    sync.RWMutex
    items map[string]interface{}
}

// NewConcurrentMap creates a new concurrent map.
func NewConcurrentMap() *ConcurrentMap {
    return &ConcurrentMap{
        items: make(map[string]interface{}),
    }
}

// Set adds or updates an element in the map.
func (m *ConcurrentMap) Set(key string, value interface{}) {
    m.Lock()
    defer m.Unlock()
    m.items[key] = value
}

// Get retrieves an element from the map.
func (m *ConcurrentMap) Get(key string) (interface{}, bool) {
    m.RLock()
    defer m.RUnlock()
    value, exists := m.items[key]
    return value, exists
}

// Delete removes an element from the map.
func (m *ConcurrentMap) Delete(key string) {
    m.Lock()
    defer m.Unlock()
    delete(m.items, key)
}

// Items returns a copy of all items in the map for safe iteration.
func (m *ConcurrentMap) Items() map[string]interface{} {
    m.RLock()
    defer m.RUnlock()
    itemsCopy := make(map[string]interface{})
    for key, value := range m.items {
        itemsCopy[key] = value
    }
    return itemsCopy
}

func main() {
    cmap := NewConcurrentMap()
    cmap.Set("name", "John Doe")
    cmap.Set("age", 30)

    if name, ok := cmap.Get("name"); ok {
        fmt.Println("Name:", name)
    }

    if age, ok := cmap.Get("age"); ok {
        fmt.Println("Age:", age)
    }

    items := cmap.Items()
    fmt.Println("All items:", items)
}

sync.Map

There is one more implementation of a concurrently safe map in the standard package: sync.Map. But when should we use sync.Map or a map with a Mutex? It's time to figure it out. According to the documentation, the Map type is optimized for two common use cases:


  1. When the entry for a given key is only ever written once but read many times, as in caches that only grow.


  2. When multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, the use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex.


To better understand and develop a feeling for why sync.Map works better in these cases rather than a map with Mutex, we should examine the source code of sync.Map.


type Map struct {
    mu    sync.Mutex
    read  atomic.Pointer[readOnly]
    dirty map[any]*entry
    // some other fields...
}

// readOnly is an immutable struct stored atomically in the Map.read field.
type readOnly struct {
    m        map[any]*entry  // The current read-only map.
    amended  bool            // True if the dirty map contains some keys not in m.
}

// An entry is a slot in the map corresponding to a particular key.
type entry struct {
    p atomic.Pointer[any]
}


The Map maintains two representations of the map data: read and dirty:


  • Read Map (read): This is the primary data store for the Map and is intended for concurrent read access. It is represented by an atomic.Pointer to a readOnly struct, ensuring atomic and lock-free reads of the map's snapshot. When a key is looked up, it first checks the read map. If the key isn't found and the map is marked as amended (indicating that there are new keys in the dirty map not yet in the read map), it falls back to checking the dirty map under a mutex lock (mu).


  • Dirty Map (dirty): This map stores entries under modification and new entries not yet visible in the read map. Access to this map requires holding the mutex lock to ensure exclusive access. On a write operation, if the key is not in the read map or needs to be updated, the operation proceeds on the dirty map. If the dirty map is nil, it is initialized by making a shallow copy of the read map, excluding entries no longer valid.


  • After a certain threshold of "misses" (failed lookups that required falling back to the dirty map), the dirty map is promoted to be the new read map, and a new dirty map is prepared if necessary.

Differences From a Built-In Go Map With Mutex

Now, we can figure out what the difference is between a built-in map with a Mutex and sync.Map:


Memory Overhead and Complexity:

  • sync.Map: Designed to minimize lock contention. For most read operations, no locks are needed because of the atomic read pointer. Writes and deletes that affect only a small portion of the map also minimize lock duration because they only lock the dirty map.


  • Built-In Map with Mutex: Every access (read or write) requires acquiring the mutex, which can become a bottleneck in high-concurrency scenarios.


Lock Contention:

  • sync.Map: Maintains two versions of the map and can be more memory-intensive. The logic to manage these versions adds complexity.


  • Built-In Map with Mutex: Simpler and uses less memory compared to sync.Map because it maintains only one version of the map.


Use Case Specificity:

  • sync.Map: Optimized for use cases where keys are mostly read and not frequently updated or where many operations occur on disjoint sets of keys.


  • Built-In Map with Mutex: General-purpose, straightforward to use but may not perform as well under specific high-concurrency scenarios due to universal lock contention.


In summary, sync.Map is specialized for scenarios that can leverage its concurrency optimizations to reduce lock contention, at the cost of increased complexity and memory usage. In contrast, a standard Go map with a mutex is a simpler and more general solution but can suffer from performance issues in highly concurrent environments.

Wrapping Up

Throughout this article, we have explored the intricacies of using map data structures, particularly focusing on their implementation and usage in the Go programming language. Starting with the basic concept of an associative array, we delved into the mechanics of hash maps, discussing their operational procedures, time and space complexities, and different approaches like separate chaining and open addressing to handle collisions.


In the realm of Go, we investigated how maps are utilized and built into the language, examining the underlying structures such as hmap and bmap. This exploration included practical code examples, demonstrating how to initialize, manipulate, and effectively employ maps in Go applications. Additionally, we highlighted the potential pitfalls of concurrency when using maps and introduced the sync.Map for scenarios requiring concurrent access, which reduces lock contention and optimizes performance.


Understanding these aspects of map data structures not only enhances your coding toolkit but also prepares you to tackle common programming challenges with greater efficiency and confidence. Whether you're developing simple applications or complex systems that require optimal performance in concurrent environments, the knowledge of how and when to use different types of maps is invaluable. As you continue to build your programming expertise, keep experimenting with these structures and their various implementations to fully harness their potential in your projects.