« Back to Index

[Golang Memory Allocation]

View original Gist on GitHub

Tags: #go #golang #memory #allocation

1. Memory Allocation Summary.md

Summary

[!TIP]

[!NOTE] When a function accepts any, Go needs to create an interface value, and the storage of the underlying value might involve a heap allocation.

Full Details

Here’s a list of some patterns we’ve found which typically cause variables to escape to the heap:

The rule of thumb is: pointers point to data allocated on the heap.
Ergo, reducing the number of pointers in a program reduces the number of heap allocations.
Pointers should primarily be used to reflect ownership semantics and mutability.

Note: an extra bonus of ‘passing by value’ is the increased safety of eliminating nil.

Copying a value is much less expensive than the overhead of using a pointer:

Reducing the number of pointers in a program can yield another helpful result: the garbage collector will skip regions of memory that it can prove will contain no pointers. For example, regions of the heap which back slices of type []byte aren’t scanned at all. This also holds true for arrays of struct types that don’t contain any fields with pointer types.

Not only does reducing pointers result in less work for the garbage collector, it produces more cache-friendly code. Reading memory moves data from main memory into the CPU caches. Caches are finite, so some other piece of data must be evicted to make room. Evicted data may still be relevant to other portions of the program.

Golang Memory Analysis

To see optimisations the go compiler has applied to your code when running it, you can use the -gcflags flag:

go run -gcflags="-m" ./main.go

To find out more about this flag we have to understand where it actually lives (because if you try to look for the flag and the possible values it can accept, then you’ll be surprised to discover no information about it when running go help run).

To start, let’s consider the go run subcommand. This will trigger a build, and that’s where the -gcflags is exposed.

go help build

  -gcflags '[pattern=]arg list'                
    arguments to pass on each go tool compile invocation.

It tells us that the go tool compile subcommand is what the flag values are passed on to, so we should look there for the relevant values, of which we’ll find (amongst many other flags) -m which is used for printing out any optimisations the go compiler has applied to the code (such as, when data escapes to the heap):

go tool compile -help

  -m    print optimization decisions 

Optimising Struct Fields

Modern CPU hardware performs reads and writes to memory most efficiently when the data is naturally aligned. CPU reads data in word-size instead of byte-size. A word in a 64-bit system is 8 bytes, while a word in a 32-bit system is 4 bytes. In short, CPU reads address in the multiple of its word size.

The side-effect of this is that the compiler will sometimes add ‘padding’ to the memory reservation to help keep things nice and even. Padding is the key to achieving data alignment, but that means a certain order of struct fields can yield extra/unncessary memory reservation, as demonstrated in the following example program (for visual aids see: https://wagslane.dev/posts/go-struct-ordering/ and https://betterprogramming.pub/how-to-speed-up-your-struct-in-golang-76b846209587):

package main

import (
	"unsafe"
)

type T1 struct {
	a int8
	b int64
	c int16
}

type T2 struct {
	a int8
	c int16
	b int64
}

func main() {
    println(unsafe.Sizeof(T1{})) // 24 (more memory reserved)
	println(unsafe.Sizeof(T2{})) // 16
}

Imagine using a 64-bit system and fetching the b variable. The CPU would take two cycles to access the data instead of one. The first cycle will fetch memory 0 to 7 and the subsequent cycle will fetch the rest. Think of it as a notebook: each page can only store word-size data, in this case, 8 bytes. If the b data is scattered across two pages, it takes two page flips to retrieve the complete data.

Another way to verify this is to first know the amount of bytes each type should consume and then check the offsets.

Here’s an example program to validate this approach:

package main

import (
	"fmt"
	"reflect"
	"runtime"
)

// Should be 4 bytes total but is actually 6!
//
// Notice the offsets aren't correct!
// Foo only takes up 1 byte (8 bits) and so Bar, which takes
// up 2 bytes (16 bits), should actually be at offset 1. 
// The fact it's offset 2 means a whole byte was reserved for
// Foo when it only needed 1 byte.
type stats struct {
	Foo uint8  // should take up 1 byte (offset: 0)
	Bar uint16 // should take up 2 byte (offset: 2)
	Baz uint8  // should take up 1 byte (offset: 4)
}

func main() {
	typ := reflect.TypeOf(stats{})
	fmt.Printf("Struct is %d bytes long\n", typ.Size())
	n := typ.NumField()
	for i := 0; i < n; i++ {
		field := typ.Field(i)
		fmt.Printf("%s at offset %v, size=%d, align=%d\n",
			field.Name, field.Offset, field.Type.Size(),
			field.Type.Align())
	}

	allStats := []stats{}
	for i := 0; i < 100000000; i++ {
		allStats = append(allStats, stats{})
	}

	printMemUsage()
}

func printMemUsage() {
	var m runtime.MemStats
	runtime.ReadMemStats(&m)
	fmt.Printf("Alloc = %v MiB", bToMb(m.Alloc))
	fmt.Printf("\tTotalAlloc = %v MiB", bToMb(m.TotalAlloc))
	fmt.Printf("\tSys = %v MiB", bToMb(m.Sys))
	fmt.Printf("\tNumGC = %v\n", m.NumGC)
}

func bToMb(b uint64) uint64 {
	return b / 1024 / 1024
}

The output of this program is:

Struct is 6 bytes long
Foo at offset 0, size=1, align=1
Bar at offset 2, size=2, align=2
Baz at offset 4, size=1, align=1

We can see although the struct should really only take up four bytes, it actually takes up six because the uint8 types are getting double the amount of bytes reserved (two instead of one).

If we change the struct such that the uint16 isn’t defined between the uint8s and check the offsets again, then we’ll get the following output:

Struct is 4 bytes long
Bar at offset 0, size=2, align=2
Foo at offset 2, size=1, align=1
Baz at offset 3, size=1, align=1

Notice that the offsets are what we would expect, offset zero for Bar, then Foo is offset straight after it at 2 and only 1 byte is reserved so Baz is offset straight after it at 3.

Also notice that because of this the total struct size is now smaller.

Golang Memory Allocation.go

package main

import "fmt"

func main() {
        x := 42
        fmt.Println(x)
}

Golang Memory Allocation.sh

$ go build -gcflags '-m' ./main.go

./main.go:7: x escapes to heap
./main.go:7: main ... argument does not escape

$ go build -gcflags '-m -m' ./main.go # extra -m's to increase verbosity (alternatively use "-m=N")

./main.go:5: cannot inline main: non-leaf function
./main.go:7: x escapes to heap
./main.go:7:         from ... argument (arg to ...) at ./main.go:7
./main.go:7:         from *(... argument) (indirection) at ./main.go:7
./main.go:7:         from ... argument (passed to call[argument content escapes]) at ./main.go:7
./main.go:7: main ... argument does not escape

# This ^^ shows that x escapes because it is passed to a function argument which escapes itself (I take this to mean it's because it accepts the argument type as interface{} and so at runtime the type has to be determined).