Golang之Map内存分配与性能

写在前面：

0x01 — 基准测试

当我们使用一个接口作为值( map[int]interface{} ）和使用一个空结构体作为值( map[int]struct{} ）进行分析时，发现它在Golang映射中的内存分配有区别。我们设置两个基准测试来比较这两种地图类型，并发现结果差异很大。对比描述如下：

在代码中，每个映射类型都有一个函数。每个函数基本上创建一个映射，然后重复固定次数的零值赋值。
也就是说，在执行结束时，映射将有固定数量且相同的条目，每个条目都将收到类型的零值（接口为nil，空结构为struct{}{}）。

代码如下：

 package main

func main() {}

func MapValueInterface() {
    m := map[int]interface{}{}
    for i := 1; i <= 100; i++ {
        m[i] = nil
    }
}

func MapValueEmptyStruct() {
    m := map[int]struct{}{}
    for i := 1; i <= 100; i++ {
        m[i] = struct{}{}
    }
}

测试基准函数：

 package main

import "testing"

func Benchmark_Map_Interface(b *testing.B) {
    for i := 0; i < b.N; i++ { // 压力测试需要遍历testing.B.N实现
        MapValueInterface()
    }
}

func Benchmark_Map_EmptyStruct(b *testing.B) {
    for i := 0; i < b.N; i++ {
        MapValueEmptyStruct()
    }
}

以上代码使用testing.B来做压力测试，测试函数必须以Benchmark为起始名称，执行结果：

 goos: darwin
goarch: amd64
pkg: goDemo/test
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
Benchmark_Map_Interface
Benchmark_Map_Interface-8       156073      7546 ns/op
Benchmark_Map_EmptyStruct
Benchmark_Map_EmptyStruct-8     196588      6097 ns/op
PASS

内存信息图：

从结果可以看出，结构体的映射速度更快，每次执行时间大概是6097纳秒，执行时间是不确定的，不过可以确定的是 空数据结构的映射要快于空接口的映射 。

0x02 — 分析

这两个映射的条目都被分配给零值（nil和struct{}{}，即不需要分配的值）。然而，我们看到空结构版本运行得更快，使用的内存更少，但它进行了更多的分配。这是怎么回事？为什么这些如此相似的基准会有如此不同的结果？

先直接上结论：

Golang的maps内部设计针对性能和内存管理进行了高度优化。映射跟踪可以保存指针的键和值。如果bucket中的条目不能容纳指针，映射只需创建溢出bucket，以避免垃圾收集(GC)带来的不必要开销，这将导致更多的分配和更好的性能（如map[int]struct{}）。

在理解上面的结论之前，我们需要了解映射初始化和映射结构。我们将首先讨论这些主题，然后分析一些基准测试，以了解我们在这里试图理解的两种映射类型的性能。

映射初始化

映射通过以下两种方法初始化：

make(map[int]string): 不知道将有多少条目添加到一个map时。

make(map[int]string，hint): 知道要添加多少条目时。hint暗示对映射初始容量的估计。

映射是可变的，无论我们选择哪种初始化方法，它们都将随需求增长而变长。但是，第二种方法至少为暗示预分配内存，对提高性能有很大帮助。

映射结构

Golang中的映射是将其键/值对存储到 bucket 的哈希表。每个bucket是一个数组，最多容纳8个条目。缺省的 bucket 数量是1。一旦、bucket上的条目数达到bucket的平均负载（也就是负载因子），映射就会通过bucket的数量增加一倍而变得更大。每当映射增长时，它都会为新的条目分配内存。在实践中，每当桶的负载达到6.5或更高时，map就会增长(读音: chang)。这个值是硬编码的，选择此值是为了优化内存使用。详细说明，在golang map.go源代码中可以找到。

在背后，map是指向hmap结构的指针，还有map结构，它还保存了一些关于map类型的信息。如果你看了go map的源码会发现map内部结构很复杂。

需要注意的一件重要事情是，映射追踪可以容纳指针的键和值。如果bucket中的条目不能保存任何指针，则该bucket将被标记为不包含指针，映射只会创建溢出bucket（这意味着更多内存分配），不保存指针将不会进行映射扫描，这可以避免GC不必要的开销。

 // mapextra holds fields that are not present on all maps.
type mapextra struct {
   // If both key and elem do not contain pointers and are inline, then we mark bucket
   // type as containing no pointers. This avoids scanning such maps.
   // However, bmap.overflow is a pointer. In order to keep overflow buckets
   // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
   // overflow and oldoverflow are only used if key and elem do not contain pointers.
   // overflow contains overflow buckets for hmap.buckets.
   // oldoverflow contains overflow buckets for hmap.oldbuckets.
   // The indirection allows to store a pointer to the slice in hiter.
   overflow    *[]*bmap
   oldoverflow *[]*bmap

   // nextOverflow holds a pointer to a free overflow bucket.
   nextOverflow *bmap
}

// A bucket for a Go map.
type bmap struct {
   // tophash generally contains the top byte of the hash value
   // for each key in this bucket. If tophash[0] < minTopHash,
   // tophash[0] is a bucket evacuation state instead.
   tophash [bucketCnt]uint8
   // Followed by bucketCnt keys and then bucketCnt elems.
   // NOTE: packing all the keys together and then all the elems together makes the
   // code a bit more complicated than alternating key/elem/key/elem/... but it allows
   // us to eliminate padding which would be needed for, e.g., map[int64]int8.
   // Followed by an overflow pointer.
}

基准分析

了解了以上关于映射的原理，再来分析下刚刚的基准测试。

struct{}没有字段，也不能容纳任何指针。因此，存储空结构的buckets将被标记为不包含指针。这将避免不必要的扫描映射，可以提高执行速度。此外，我们还可以预估map[int]Struct{}类型的映射会占用更多的内存分配，因为随着它的增长，它会创建更多的溢出buckets。

与空结构不同，interface{}类型可以包含任何值，包括指针。为了理解这对映射性能的影响，我们需要了解映射bucket追踪包含所有指针的内存（ptrdata）前缀的大小。可以容纳指针的映射类型不会分配额外的溢出桶。但是，这些类型需要GC进行扫描。

在映射内部实现中，我们可以看到如何使用ptrdata字段来决定是否要创建更多的溢出桶(map.go，第266行）。

 func (h *hmap) newoverflow(t *maptype, b *bmap) *bmap {
   var ovf *bmap
   if h.extra != nil && h.extra.nextOverflow != nil {
      // We have preallocated overflow buckets available.
      // See makeBucketArray for more details.
      ovf = h.extra.nextOverflow
      if ovf.overflow(t) == nil {
         // We're not at the end of the preallocated overflow buckets. Bump the pointer.
         h.extra.nextOverflow = (*bmap)(add(unsafe.Pointer(ovf), uintptr(t.bucketsize)))
      } else {
         // This is the last preallocated overflow bucket.
         // Reset the overflow pointer on this bucket,
         // which was set to a non-nil sentinel value.
         ovf.setoverflow(t, nil)
         h.extra.nextOverflow = nil
      }
   } else {
      ovf = (*bmap)(newobject(t.bucket))
   }
   h.incrnoverflow()
   if t.bucket.ptrdata == 0 { //这里判断是否需要创建新的溢出buckets
      h.createOverflow()
      *h.extra.overflow = append(*h.extra.overflow, ovf)
   }
   b.setoverflow(t, ovf)
   return ovf
}

0x03 — 结论

当我们比较Interface和empty struct时，随着条目数量的增加，测试在速度上的差异越来越大，接口基准比空结构基准慢得多。同时，当映射的初始容量(hint)大于条目数时，映射不需要太多的分配，因为所有条目都有一个预先分配的空间。

导致堆栈溢出问题的不同结果与性能和内存管理的映射优化有关。map[int]interface{}类型的映射速度较慢，因为当GC扫描到可以容纳指针的存储buckets时，它们的性能会下降。map[int]struct{}类型的映射使用的内存更少，因为没有为空结构分配空间。尽管nil是interface{}的零值，但interface{}类型还是需要一些空间来存储任何值。最后，空结构基准分配内存更高，因为map[int]struct{}类型需要更多的溢出buckets（可进行性能优化），因为它的buckets不包含任何指针。