使用Golang完成一个内存局部性测试实验

1.测试代码

package main
func Loop(nums []int,step int){l := len(nums)for i := 0 ; i < step ; i++ {for j := i ; j < l ; j += step {nums[j] = 4}}
}func main(){mySlice := make([]int,10)Loop(mySlice)
}

上述是验证代码内存局部性特征的一段代码。如果step选择3，第一次遍历会被遍历的nums下标为0、3、6、9、12……，第二次遍历会遍历的nums下标为1、4、7、10、13……，第三次遍历会遍历的nums下标为2、5、8、11、14……。那么三次外循环就会将全部遍历完整个nums数组。上述的程序表示了访问数组的局部性，step跨度越小，则表示访问nums相邻内存的局部性约好，step越大则相反。

2.Benchmark 测试

接下来用Golang的Benchmark性能测试来分别对step取不同的值进行压测，来看看通过Benchmark执行Loop()函数而统计出来的几种情况，最终消耗的时间差距为多少。首先创建loop_test.go文件，实现一个制作数组并且赋值初始化内存值的函数CreateSource()，代码如下：


func CreateSource(len int) []int {nums := make([]int, len)for i := 0; i < len; i++ {nums = append(nums, i)}return nums
}

其次实现一个Benchmark，制作一个长度为10000的数组，这里要注意的是创建完数组后要执行b.ResetTimer()重置计时，去掉CreateSource()消耗的时间，step跨度为1的代码如下：


func BenchmarkLoopStep1(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i:=0; i < b.N; i++ {Loop(src, 1)}
}

3.完整的代码：

loop.go

func CreateSource(len int) []int {nums := make([]int, 0, len)for i := 0; i < len; i++ {nums = append(nums, i)}return nums
}func Loop(nums []int, step int) {l := len(nums)for i := 0; i < step; i++ {for j := i; j < l; j += step {nums[j] = 4 //访问内存，并写入值}}
}

loop_test.go

package mainimport "testing"func BenchmarkLoopStep1(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i := 0; i < b.N; i++ {Loop(src, 1)}
}func BenchmarkLoopStep2(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i := 0; i < b.N; i++ {Loop(src, 2)}
}func BenchmarkLoopStep3(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i := 0; i < b.N; i++ {Loop(src, 3)}
}func BenchmarkLoopStep4(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i := 0; i < b.N; i++ {Loop(src, 4)}
}func BenchmarkLoopStep5(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i := 0; i < b.N; i++ {Loop(src, 5)}
}func BenchmarkLoopStep6(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i := 0; i < b.N; i++ {Loop(src, 6)}
}func BenchmarkLoopStep12(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i := 0; i < b.N; i++ {Loop(src, 12)}
}func BenchmarkLoopStep16(b *testing.B) {//制作源数据，长度为10000src := CreateSource(10000)b.ResetTimer()for i := 0; i < b.N; i++ {Loop(src, 16)}
}

4.输出结果分析

使用命令：

go test -bench=.  -count=3

输出结果如下：

goos: darwin
goarch: arm64
pkg: v1
BenchmarkLoopStep1-8              405445              2890 ns/op
BenchmarkLoopStep1-8              413742              2881 ns/op
BenchmarkLoopStep1-8              411201              2884 ns/op
BenchmarkLoopStep2-8              412641              2902 ns/op
BenchmarkLoopStep2-8              412040              2902 ns/op
BenchmarkLoopStep2-8              412099              2903 ns/op
BenchmarkLoopStep3-8              409592              2930 ns/op
BenchmarkLoopStep3-8              404161              2947 ns/op
BenchmarkLoopStep3-8              407128              2922 ns/op
BenchmarkLoopStep4-8              407964              2931 ns/op
BenchmarkLoopStep4-8              407895              2932 ns/op
BenchmarkLoopStep4-8              408778              2928 ns/op
BenchmarkLoopStep5-8              403932              2952 ns/op
BenchmarkLoopStep5-8              405253              2950 ns/op
BenchmarkLoopStep5-8              404827              2951 ns/op
BenchmarkLoopStep6-8              400930              2963 ns/op
BenchmarkLoopStep6-8              403382              2963 ns/op
BenchmarkLoopStep6-8              396916              2965 ns/op
BenchmarkLoopStep12-8             387514              3056 ns/op
BenchmarkLoopStep12-8             391561              3056 ns/op
BenchmarkLoopStep12-8             389544              3055 ns/op
BenchmarkLoopStep16-8             383607              3112 ns/op
BenchmarkLoopStep16-8             377530              3115 ns/op
BenchmarkLoopStep16-8             380583              3121 ns/op
PASS
ok      v1      32.574s

首先对上述输出内容各字段进行解释：
- BenchmarkLoopStep1-8 ：GOMAXPROCS（线程数）为8
- ‘405445’表示执行的次数
- ‘2890’表示平均耗时
上述结果表明：代码内存局部性越好(step越小)，那么代码的😊越好。

5.扩展思考

在Golang的GPM调度器模型中，为什么一个G开辟的子G优先放在当前的本地G队列中，而不是放在其他M上的本地P队列中？GPM为何要满足局部性的调度设计？

首先回忆一哈GPM架构。
一个G开辟的子G优先放在本地G队列中是为了尽可能的提高内存的局部性
GPM为何要满足局部性的调度设计是为了尽可能的提升效率【可以试着从如果不这么设计会发生什么样的情况】