Go map 底层，以及扩容

总结笔记《Go语言精进之路》

底层

map底层数据结构，也就是运行时的map的样子。

从语法，到运行时

src/runtime/map.go

源码里面key value对应起的名字是key elem

语法方法到runtime方法

操作	转换前	转换后
初始化	m := make(map[keyType]valType, capacityhint)	m := runtime.makemap(maptype, capacityhint, m )
取值	v := m[“key”]	v := runtime.mapaccess1(maptype, m, “key”)
取值	v, ok := m[“key”]	v, ok := runtime.mapaccess2(maptype, m, “key”)
赋值	m[“key”] = “value”	v := runtime.mapassign(maptype, m, “key”)
删元素	delete(m, “key”)	runtime.mapdelete(maptype, m, “key”)

map根据key获取value的效率，比不上索引寻址——切片和数组获取元素的效率。

注意，上面每个转换后的运行时对应的函数调用的参数里，第一个参数都是maptype。
（这个类型里面存了关于map的key,elem大小、类型信息。是一个抽象复用设计）
maptype的存在让Go中所有map类型共享一套运行时map操作函数。
/src/runtime/type.go
type maptype struct {
typ         _type
key       *_type
elem     *_type
bucket *_type          // internal type representing a hash bucket

// function for hashing keys (ptr to key, seed) -> hash
hasher         func(unsafe.Pointer, uintptr) uintptr
keysize        uint8          // size of key slot
elemsize      uint8          // size of elem slot
bucketsize   uint16        // size of bucket
flags            uint32
}

map，到runtime.hmap

语法层面的map类型，会翻译为runtime.hmap。
一个map变量，是一个runtime.hmap类型的实例。
hmap，是map类型的header（所以取名hmap嘛，h就是header的首字母），意思就是它是map类型的描述符。
运行时的map类型实现（map的底层数据结构）---hmap
对应的Go源码定义：


// A header for a Go map.
type hmap struct {
	// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)
	flags     uint8
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
	hash0     uint32 // hash seed

	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

	extra *mapextra // optional fields
}

英文：
overflow 溢出
bucket 桶

关注的几个字段，每个字段的用途：

字段	含义
count	就是len()获取map的实际大小的值
flags	表示map的几种状态（了解）（维护的时候发现修改啥的源码没做并发保护，也知道了为啥说map非线程安全了）
B	2的B次方就是buckets数组的长度
noverflow	overflow bucket的大致num 数目（“溢出桶” 这个角色出场的原因，就是某个桶bucket[i]装满了，但是hash计算完的key，说，还得存这个桶下面，那怎么办？就弄个溢出桶，链表链着）
buckets	指向bucket数组的指针（buckets数组：map存key value的地方）
oldbuckets	当发生扩容的时候，该字段指向原bucket数组，等全迁移完数据，就不用指了，原bucket数组就可以释放了，等着GC了
nevacuate	nevacuate的bucket已经完成了数据的排空和迁移

一句话，hmap结构体一个成员是buckets 数组，数组长度是2的B次方。
数组每个元素bucket里面有：长度为8的tophash数组，长度为8的key数组，长度为8的value数组，还有万一8个槽装不下还得装避免尴尬的overflow bucket 链表。

map的数据结构这么设计，怎么存数据？

下面挨个说这些设计完了，map到底怎么存数据？以及设计好在哪？有必要吗？

怎么存key value

来了一个key, value，要存。
map就拿到这个key，进行一个哈希函数计算 hashcode = Hash(key)，得到哈希码hashcode。
这个hashcode有16位。
低8位确定存哪个桶bucket[i]（所以每个桶都是存的低8位相同hashcode的key元素们），
高8位确定存bucket[i]的哪个位置。
hashcode的作用

tophash的设计意义

tophash的存在意义，就是，找key位置更快。
因为map可以允许key的类型好多种（只要可以等于不等于比较），直接用key比对来找的话，那么万一，比如，key是很长的字符串，那么比较就效率肯定就低了。

所以看上图就能看到，一个bucket桶的设计不是说，就key存储和value存储两部分就完了，为了效率加了tophash部分，是空间换时间的思想设计。

overflow的设计意义

hashcode=Hash(key) 计算的结果低8位确定唯一的bucket[i]。
overflow bucket设计，就是为了缓解当存在很多key，计算后的hashcode的低8位相同的个数大于了一个bucket所能存放的数目8个时，且这个map还没达到扩容条件时，做的一种存储设计。
同一个bucket的多个overflow bucket 以链表的形式连接。

key和value分开存的意义

key-value存储方案对比

英文
padding 填充

是为了节省内存空间。
所以不惜增加了算法复杂度！啊不是，是仅仅增加了一些复杂度。[doge]

因为需要内存对齐，所以如果kv一组存储，按例子map[int8]int64，就会发现，每个key都得padding 7 bytes。相比图中左边Go采取的方案，右边的多出了8*7bytes=56bytes的内存浪费。百分比一算就发现浪费太高。

一个bucket 桶，在Go源码中的代码是：
是bmap这个结构体（b:bucket)

// A bucket for a Go map.
type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

自动扩容

map会自动扩容，扩容的内存分配由运行时负责。只要内存够，就可以一直往map里添新数据。
扩容也就是扩充buckets的数量。并重新在bucket间均衡分配数据。

触发扩容有两种：
一个是count太多了！
一个是overflow bucket 数量太多了！

1、count太多，多到 count > LoadFactor * 2^B 这种程度！

count > LoadFactor * 2^B 时，会扩容。

const(
	...
	// Maximum average load of a bucket that triggers growth is 6.5.

	// Represent as loadFactorNum/loadFactorDen, to allow integer math.
	loadFactorNum = 13
	loadFactorDen = 2
	...
)

LoadFactor负载因子目前设置为6.5，就是上面定义的常量值loadFactorNum除以loadFactorDen计算出来的。

然后2^B是buckets数组的长度。

扩容方式是，创建一个原来buckets规模的2倍大的buckets，然后oldbuckets指向原buckets数组，只有当assgin和delete操作（即插入和删除）时，进行排空和迁移，直到完全迁移完，删除oldbuckets指向，原buckets数组释放。
map扩容示意图

2、overflow bucket 数量太多

noverflow >= uint16(1)<<(B&15)

扩容方式是，创建一个和原来buckets一样规模的buckets，也是只当assgin和delete操作（即插入和删除）时，进行排空和迁移。

map自动扩容源代码

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
	...
	// If we hit the max load factor or we have too many overflow buckets,
	// and we're not already in the middle of growing, start growing. 我们还没开始生长，就生长吧。[doge]。
	if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
		hashGrow(t, h)
		goto again // Growing the table invalidates everything, so try again
	}
	...
}

map容量

map 容量

和数组不同，map 可以根据新增的 key-value 动态的伸缩，因此它不存在固定长度或者最大限制，但是也可以选择标明 map 的初始容量 capacity，格式如下：

make(map[keytype]valuetype, cap)

例如：

m := make(map[string]float, 20)

当 map 增长到容量上限的时候，如果再增加新的 key-value，map 的大小会自动加 1，所以出于性能的考虑，对于大的 map 或者会快速扩张的 map，即使只是大概知道容量，也最好先标明。