golang map源码剖析 - Golang教程网

具体源码在 runtime/map.go下面，有1300+行代码

golang 的map底层使用哈希表实现的，源码中对应是一个结构体，具体如下

// A header for a Go map.
type hmap struct {
	// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)  map中元素的个数
	flags     uint8  //标识当前map是在读或者写的状态，每次操作会更新这个值
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)  桶的个数相关参数，2^B 是桶的个数
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details  溢出桶的个数
	hash0     uint32 // hash seed   hash的种子，用方法生成的随机值，每个map都不一样，所以同一个key在不同map位置也不一样
	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.   桶数组的地址指针，指向桶连续存储地址的初始位置
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing  扩容时旧的桶的数组指针
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)  扩容搬迁进度

	extra *mapextra // optional fields  记录溢出相关信息
}
复制代码

每个map里面有若干桶，hmap里面只记录了桶的地址，找到具体桶内元素需要知道桶的结构，桶在源码中用bmap结构体表示，具体如下

// A bucket for a Go map.
type bmap struct {
    tophash [bucketCnt]uint8 # bucketCnt为8 存储哈希值的高8位
    data    byte[1]  //key value数据:key/key/key/.../value/value/value...
    overflow *bmap   //溢出bucket的地址
}
复制代码

根据哈希函数将key生成一个哈希值，其中低位哈希用来判断桶位置，高位哈希用来确定在桶中哪个cell。低位哈希就是哈希值的低B位，hmap结构体中的B，比如B为5，2^5=32，即该map有32个桶，只需要取哈希值的低5位就可以确定当前key-value落在哪个桶(bucket)中；高位哈希即tophash，是指哈希值的高8bits，根据tophash来确定key在桶中的位置。每个桶可以存储8对key-value，存储结构不是key/value/key/value...，而是key/key..value/value，这样可以避免字节对齐时的padding，节省内存空间。

当不同的key根据哈希得到的tophash和低位hash都一样，发生哈希碰撞，这个时候就体现overflow pointer字段的作用了。桶溢出时，就需要把key-value对存储在overflow bucket（溢出桶），overflow pointer就是指向overflow bucket的指针。如果overflow bucket也溢出了呢？那就再给overflow bucket新建一个overflow bucket，用指针串起来就形成了链式结构

初始化map

// makemap implements Go map creation for make(map[k]v, hint).
// If the compiler has determined that the map or the first bucket
// can be created on the stack, h and/or bucket may be non-nil.
// If h != nil, the map can be created directly in h.
// If h.buckets != nil, bucket pointed to can be used as the first bucket.
func makemap(t *maptype, hint int, h *hmap) *hmap {
	mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)
	if overflow || mem > maxAlloc {
		hint = 0
	}

	// initialize Hmap
	if h == nil {
		h = new(hmap)
	}
	h.hash0 = fastrand()//赋随机值

	// Find the size parameter B which will hold the requested # of elements.
	// For hint < 0 overLoadFactor returns false since hint < bucketCnt.
	B := uint8(0)
	for overLoadFactor(hint, B) {
		B++
	}
	h.B = B//个数赋值

	// allocate initial hash table
	// if B == 0, the buckets field is allocated lazily later (in mapassign)
	// If hint is large zeroing this memory could take a while.
	if h.B != 0 {
		var nextOverflow *bmap
		h.buckets, nextOverflow = makeBucketArray(t, h.B, nil)
		if nextOverflow != nil {
			h.extra = new(mapextra)
			h.extra.nextOverflow = nextOverflow
		}
	}

	return h
}
复制代码

查找map中的元素

// mapaccess1 returns a pointer to h[key].  Never returns nil, instead
// it will return a reference to the zero object for the elem type if
// the key is not in the map.
// NOTE: The returned pointer may keep the whole map live, so don't
// hold onto it for very long.
func mapaccess1(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
	if raceenabled && h != nil {
		callerpc := getcallerpc()
		pc := funcPC(mapaccess1)
		racereadpc(unsafe.Pointer(h), callerpc, pc)
		raceReadObjectPC(t.key, key, callerpc, pc)
	}
	if msanenabled && h != nil {
		msanread(key, t.key.size)
	}
	if h == nil || h.count == 0 {
		if t.hashMightPanic() {
			t.hasher(key, 0) // see issue 23734
		}
		return unsafe.Pointer(&zeroVal[0])
	}
	if h.flags&hashWriting != 0 {//若干目前标记是写，报panic不能并发读写
		throw("concurrent map read and map write")
	}
	hash := t.hasher(key, uintptr(h.hash0))//计算key的哈希值
	m := bucketMask(h.B)//计算桶链首地址
	b := (*bmap)(add(h.buckets, (hash&m)*uintptr(t.bucketsize)))
	if c := h.oldbuckets; c != nil {
		if !h.sameSizeGrow() {
			// There used to be half as many buckets; mask down one more power of two.
			m >>= 1
		}
		oldb := (*bmap)(add(c, (hash&m)*uintptr(t.bucketsize)))
		if !evacuated(oldb) {
			b = oldb
		}
	}
	top := tophash(hash)// 计算tophash值
bucketloop:
	for ; b != nil; b = b.overflow(t) {//遍历每一个桶
		for i := uintptr(0); i < bucketCnt; i++ {//根据高8位找在桶内的元素
			if b.tophash[i] != top {//tophash有8个元素，存储的key高8位信息
				if b.tophash[i] == emptyRest {
					break bucketloop
				}
				continue
			}
			k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))
			if t.indirectkey() {
				k = *((*unsafe.Pointer)(k))
			}
			if t.key.equal(key, k) {//最后的判断key相等
				e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))
				if t.indirectelem() {
					e = *((*unsafe.Pointer)(e))
				}
				return e//找到直接返回
			}
		}
	}
	return unsafe.Pointer(&zeroVal[0])//没找到，返回对应类型的零值
}

复制代码

map的查找有两种用法，底层方法都是用的上面的mapaccess1，在mapaccess1返回参数的基础上在加一些判断，返回参数是一个，如下

m := make(map[string]int)

print(m["1"]) -- 结果为: 0 (返回int对应的默认值)
复制代码

对应的源码，很简单，直接调用mapaccess1然后返回

func mapaccess1_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) unsafe.Pointer {
	e := mapaccess1(t, h, key)
	if e == unsafe.Pointer(&zeroVal[0]) {
		return zero
	}
	return e
}
复制代码

还有一种返回参数是两个，第二个参数标识map是否存在这个key，不存在，第一个参数就是默认零值

m := make(map[string]int)

val, ok := m["1"]
print(val, ok)  -- 结果为：0, false
复制代码

对应的源码，相比第一种就是判断如果是默认零值就多返回一个参数false，否则返回true

func mapaccess2_fat(t *maptype, h *hmap, key, zero unsafe.Pointer) (unsafe.Pointer, bool) {
	e := mapaccess1(t, h, key)
	if e == unsafe.Pointer(&zeroVal[0]) {
		return zero, false
	}
	return e, true
}
复制代码

上面两种用法的源码mapaccess1_fat，和mapaccess2_fat是我猜测的，看源码感觉应该是这样的，没有验证过

map还有一种用发，使用for range遍历map元素，这里有一个考点是map乱序，1.map底层散列表在扩容场景下会重新散列，遍历顺序有变化 2.go设计者为了不让用户不依赖，每次遍历的桶的起点是不一样的

map 扩容策略

负载因子 = 键数量/bucket数量
哈希因子过小，说明空间利用率低
哈希因子过大，说明冲突严重，存取效率低

map的扩容机制从几个方面来讲：

1.扩容时机

map插入新key的时候，满足条件，就会触发扩容

2.扩容条件

1.超过负载

map元素个数》 6.5 * 桶个数

平均一个桶有6.5个元素了，说明空间利用率很大

2.溢出桶太多

负载因子比较小的时候，溢出桶比较多，造成原因有元素新增的时候新建了很多溢出桶，后来又删除了很多，就会造成元素很少，负载因子小，但是溢出桶很多，元素分布比较稀疏，这个时候应该合并一些溢出桶

3.扩容机制

根据不同的扩容条件，有不同的扩容机制

1.超过负载

双倍扩容机制，创建新的buckets数组，大小是原来的两倍，把老的搬到新的桶里面

2.溢出桶比较多

等量扩容机制，桶的数量维持不变，把松散的键值重新排列一次，使得同一个桶里面的元素排列更紧密，减少溢出桶，节省空间

4.扩容方式

采用渐进式迁移，一次性全部迁移可能会造成比较大的时延，所以每次增删一个key的时候会去做搬迁

对于go1.6以上版本，如果出现【并发map读写】程序会直接以fatal error崩溃，即使同routine内有recover()也不能恢复

参考 www.bilibili.com/read/cv5588… blog.csdn.net/fengshenyun…