golang 性能分析

1. 性能分析

golang 自身提供的工具中包括性能分析工具 - pprof。

golang 内建提供了多种性能收集器，它们负责收集这些性能数据：

Goroutine: stack traces of all current Goroutines。Go Routine 分析。
CPU: stack traces of CPU returned by runtime。CPU 分析。按照一定的频率采集所监听的应用程序 CPU（含寄存器）的使用情况，可确定应用程序在主动消耗 CPU 周期时花费时间的位置
Heap: a sampling of memory allocations of live objects。内存分析，在应用程序进行堆分配时记录堆栈跟踪，用于监视当前和历史内存使用情况，以及检查内存泄漏
Allocation: a sampling of all past memory allocations。内存分析，对象分配统计以及内存泄漏等。
Thread: stack traces that led to the creation of new OS threads。OS 线程创建情况。
Block: stack traces that led to blocking on synchronization primitives。阻塞分析。记录 go routine 阻塞情况，等待以及同步情况，timer/channel 通讯等各项细节。
Mutex: stack traces of holders of contended mutexes。互斥锁分析。包括各种竞争情况。

2. pprof工具分析

（1.）通过在main函数中添加性能监控

package main

import 	(
  _ "net/http/pprof" // 引入 pprof 模块
  _ "github.com/mkevac/debugcharts"  // 可选，图形化插件
)

func main(){
    // ...
    // 内存分析
	go func() {
		http.ListenAndServe("0.0.0.0:8090", nil) // 监控端口
	}()
    // ...
}

"net/http/pprof" 包将会自动地添加一组 api 端点，该包的 init 函数会做这些事：

func init() {
	http.HandleFunc("/debug/pprof/", Index)
	http.HandleFunc("/debug/pprof/cmdline", Cmdline)
	http.HandleFunc("/debug/pprof/profile", Profile)
	http.HandleFunc("/debug/pprof/symbol", Symbol)
	http.HandleFunc("/debug/pprof/trace", Trace)
}

2.1 web界面查看

直接通过浏览器访问http://localhost:8090/debug/pprof, 就会看到如下界面：

参数说明：

cpu（CPU Profiling）: $HOST/debug/pprof/profile，默认进行 30s 的 CPU Profiling，得到一个分析用的 profile 文件
block（Block Profiling）：$HOST/debug/pprof/block，查看导致阻塞同步的堆栈跟踪
goroutine：$HOST/debug/pprof/goroutine，查看当前所有运行的 goroutines 堆栈跟踪
heap（Memory Profiling）: $HOST/debug/pprof/heap，查看活动对象的内存分配情况
mutex（Mutex Profiling）：$HOST/debug/pprof/mutex，查看导致互斥锁的竞争持有者的堆栈跟踪
threadcreate：$HOST/debug/pprof/threadcreate，查看创建新OS线程的堆栈跟踪

github.com/mkevac/debugchartshttp://localhost:8090/ui

2.2 命令行交互

（1.）CPU分析

go tool pprof http://localhost:6060/debug/pprof/profile\?seconds\=60

Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=60
Saved profile in /Users/eddycjy/pprof/pprof.samples.cpu.007.pb.gz
Type: cpu
Duration: 1mins, Total samples = 26.55s (44.15%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)

top10list func_namepdfwebtreetraceshelp

(pprof) top10
Showing nodes accounting for 25.92s, 97.63% of 26.55s total
Dropped 85 nodes (cum <= 0.13s)
Showing top 10 nodes out of 21
      flat  flat%   sum%        cum   cum%
    23.28s 87.68% 87.68%     23.29s 87.72%  syscall.Syscall
     0.77s  2.90% 90.58%      0.77s  2.90%  runtime.memmove
     0.58s  2.18% 92.77%      0.58s  2.18%  runtime.freedefer
     0.53s  2.00% 94.76%      1.42s  5.35%  runtime.scanobject
     0.36s  1.36% 96.12%      0.39s  1.47%  runtime.heapBitsForObject
     0.35s  1.32% 97.44%      0.45s  1.69%  runtime.greyobject
     0.02s 0.075% 97.51%     24.96s 94.01%  main.main.func1
     0.01s 0.038% 97.55%     23.91s 90.06%  os.(*File).Write
     0.01s 0.038% 97.59%      0.19s  0.72%  runtime.mallocgc
     0.01s 0.038% 97.63%     23.30s 87.76%  syscall.Write

参数说明：

flat：给定函数上运行耗时
flat%：同上的 CPU 运行耗时总比例
sum%：给定函数累积使用 CPU 总比例
cum：当前函数加上它之上的调用运行总耗时
cum%：同上的 CPU 运行耗时总比例

（2.）内存分析

go tool pprof http://localhost:8080/debug/pprof/heap

// 停掉后，过一分钟后再次执行上述命令，使其生成两个文件，进行比较
go tool pprof -base pprof.demo2.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz pprof.demo2.alloc_objects.alloc_space.inuse_objects.inuse_space.002.pb.gz

// 使用top命令查看占用较多的，再使用list查看具体占用情况

内存分析有几个参数：

-alloc_objects 申请对象的内存占用情况
-inuse_objects 在使用中的对象内存占用情况
-inuse_space 使用中的内存空间

使用pprof的heap能够获取程序运行时的内存信息，在程序平稳运行的情况下，每个一段时间使用heap获取内存的profile，然后使用base能够对比两个profile文件的差别，需要不断的比较两次的内存申请和占用情况，关注哪些对象的申请不断的增长，哪些对象在程序结束后，仍然在使用中，没有释放内存。

要点：

内存profiling记录的是堆内存分配的情况，以及调用栈信息，并不是进程完整的内存情况
栈内存的分配是在调用栈结束后会被释放的内存，所以并不在内存profile中。
内存profiling是基于抽样的，默认是每1000次堆内存分配，执行1次profile记录。
因为内存profiling是基于抽样和它跟踪的是已分配的内存，而不是使用中的内存，（比如有些内存已经分配，看似使用，但实际以及不使用的内存，比如内存泄露的那部分），所以不能使用内存profiling衡量程序总体的内存使用情况。
在进行堆分配时，内存分析会记录调用栈跟踪，栈分配被认为是无成本的，并且在内存 profile 中不被追踪。
使用内存profiling不能够发现内存泄露

2.3 火焰图

（1.）安装go-torch

# 下载并复制flamegraph.pl到/usr/local/bin下
git clone https://github.com/brendangregg/FlameGraph.git
cp FlameGraph/flamegraph.pl /usr/local/bin

# 安装go-torch
go get -v github.com/uber/go-torch

(2.) 使用go-torch查看

// 使用
go-torch -u http://10.11.209.102:9911 -t 30
go-torch -u http://10.0.2.15:6060 --suffix=/debug/pprof/block -p > torch.svg
go-torch cpu.prof  // top  list

3. 使用goland自带的基准测试分析

goland2020版本，提供了基准测试，可以进行内存、CPU的分析：

如果golang版本较低，可能没有自带的运行按钮，可以使用以下命令执行：

go test -cpuprofile cpu.prof -memprofile mem.prof -bench .

// 对生成的文件，使用go tool进行分析
go tool pprof cpu.pprof
top10 -cum // 查看top10占用情况
list xxx // 查看具体某个函数的内存

go tool pprof -http=":8080" cpu.pprof  // 使用web界面进行分析

4.自行写脚本查看内存占用情况

4.1 使用shell命令来实时查看内存占用情况

pidstat  -r -p $pid 1

vim prog_mem.sh
#!/bin/bash
prog_name="your_programe_name"
prog_mem=$(pidstat  -r -u -h -C $prog_name |awk 'NR==4{print $12}')
time=$(date "+%Y-%m-%d %H:%M:%S")
echo $time"    memory(Byte):"$prog_mem >>~/record/prog_mem.log

(2.) 使用crontab建立定时任务，每分钟记录1次.

crontab -e
*/1 * * * * /record/prog_mem.sh

(3.) 将输出的日志信息prog_mem.log进行分析，可以使用可视化工具进行查看。

4.2 使用golang程序开启内存、CPU采集

go tool pprof -http=:6060 cpu.prof

func enableCpuProfile(cpuProfilePath string) (closer func()) {
	closer = func() {}
	if cpuProfilePath != "" {
		f, err := os.Create(cpuProfilePath)
		if err != nil {
			log.Fatal("could not create cpu profile: %v", err)
		}
		err = pprof.StartCPUProfile(f)
		if err != nil {
			log.Fatal("error: %v", err)
		}
		closer = pprof.StopCPUProfile
	}
	runtime.SetBlockProfileRate(20)
	return
}

func enableMemProfile(memProfilePath string) (closer func()) {
	closer = func() {}
	if memProfilePath != "" {
		closer = func() {
			f, err := os.Create(memProfilePath)
			if err != nil {
				log.Fatal("could not create memory profile: ", err)
			}
			defer f.Close()
			runtime.GC() // get up-to-date statistics
			if err := pprof.WriteHeapProfile(f); err != nil {
				log.Fatal("could not write memory profile: ", err)
			}
		}
	}
	return
}

也可以简化程序，在main中加入这两行代码，在程序退出时，会生成pprof文件。

	# 这里profile引用包：github.com/pkg/profile
	defer profile.Start(profile.CPUProfile).Stop()  // CPU
	defer profile.Start(profile.MemProfile).Stop()  // 内存
	defer profile.Start(profile.GoroutineProfile).Stop() // goroutine
	defer profile.Start(profile.BlockProfile).Stop()  // 阻塞
	defer profile.Start(profile.MutexProfile, profile.MemProfile).Stop()  // 合并写法

5.常见的内存泄露

5.1 goroutine泄露

每个goroutine占用2KB内存，泄露1百万goroutine至少泄露2KB * 1000000 = 2GB内存。goroutine执行过程中还存在一些变量，如果这些变量指向堆内存中的内存，GC会认为这些内存仍在使用，不会对其进行回收，这些内存谁都无法使用，造成了内存泄露。

所以goroutine泄露有2种方式造成内存泄露：

goroutine本身的栈所占用的空间造成内存泄露。
goroutine中的变量所占用的堆内存导致堆内存泄露，这一部分是能通过heap profile体现出来的。

判断依据：在节点正常运行的情况下，隔一段时间获取goroutine的数量，如果后面获取的那次，某些goroutine比前一次多，如果多获取几次，是持续增长的，就极有可能是goroutine泄露。

定位：

http://ip:port/debug/pprof/goroutine?debug=1  //查看当前阻塞在此goroutine的数量
http://ip:port/debug/pprof/goroutine?debug=2  // 可以查看goroutine阻塞的时长

总结：

goroutine泄露常见的场景是channel阻塞，无法继续向下执行，导致此goroutine关联的内存都无法释放，进一步造成内存泄露。

channel的读或者写：

无缓冲channel的阻塞通常是写操作因为没有读而阻塞
有缓冲的channel因为缓冲区满了，写操作阻塞
期待从channel读数据，结果没有goroutine写
select操作，select里也是channel操作，如果所有case上的操作阻塞，goroutine也无法继续执行。

5.2 time.After 内存泄露

# 错误示例：
package main

import (
    "time"
    )

func main()  {
    ch := make(chan int, 10)

    go func() {
        var i = 1
        for {
            i++
            ch <- i
        }
    }()

    for {
        select {
        case x := <- ch:
            println(x)
        case <- time.After(3 * time.Minute):
            println(time.Now().Unix())
        }
    }
}

在for循环每次select的时候，都会实例化一个一个新的定时器。该定时器在3分钟后，才会被激活，但是激活后已经跟select无引用关系，被gc给清理掉。time.After定时器在定时任务到达之前，会一直存在于时间堆中，不会释放资源，直到定时任务时间到达后才会释放资源。

在go代码中，在for循环里不要使用select + time.After的组合，可以使用time.NewTimer替代：

 1 package main
 2 
 3 import (
 4     "time"
 5     )
 6 
 7 func main()  {
 8     ch := make(chan int, 10)
 9 
10     go func() {
11         for {
12             ch <- 100
13         }
14     }()
15 
16     idleDuration := 3 * time.Minute
17     idleDelay := time.NewTimer(idleDuration)
18     defer idleDelay.Stop()
19 
20     for {
21         idleDelay.Reset(idleDuration)
22 
23         select {
24             case x := <- ch:
25                 println(x)
26             case <-idleDelay.C:
27                 return
28             }
29     }
30 }

6.开发建议：

var pool = sync.Pool{New: func() interface{} { return make([]byte, 4096) }}

func fn() {
	buf := pool.Get().([]byte) // takes from pool or calls New
	// do work
	pool.Put(buf) // returns buf to the pool
}