0. 环境准备

下载示例Project :

go get -d github.com/wolfogre/go-pprof-practice 

关键代码:

package main

import (
	_ "net/http/pprof" // 会自动注册 handler 到 http server,方便通过 http 接口获取程序运行采样报告
)

func main() {
	runtime.GOMAXPROCS(1) // 限制 CPU 使用数,避免过载
	runtime.SetMutexProfileFraction(1) // 开启对锁调用的跟踪
	runtime.SetBlockProfileRate(1) // 开启对阻塞操作的跟踪

	go func() {
		// 启动一个 http server,注意 pprof 相关的 handler 已经自动注册过了
		if err := http.ListenAndServe(":6060", nil); err != nil {
			log.Fatal(err)
		}
		os.Exit(0)
	}()
}
http://localhost:6060/debug/pprof/

看到如下内容:

 /debug/pprof/

Types of profiles available:
Count	Profile
27	allocs
3	block
0	cmdline
45	goroutine
27	heap
1	mutex
0	profile
6	threadcreate
0	trace
full goroutine stack dump
Profile Descriptions:

allocs: A sampling of all past memory allocations
block: Stack traces that led to blocking on synchronization primitives
cmdline: The command line invocation of the current program
goroutine: Stack traces of all current goroutines
heap: A sampling of memory allocations of live objects. You can specify the gc GET parameter to run GC before taking the heap sample.
mutex: Stack traces of holders of contended mutexes
profile: CPU profile. You can specify the duration in the seconds GET parameter. After you get the profile file, use the go tool pprof command to investigate the profile.
threadcreate: Stack traces that led to the creation of new OS threads
trace: A trace of execution of the current program. You can specify the duration in the seconds GET parameter. After you get the trace file, use the go tool trace command to investigate the trace.

下面正式开始排查

1.排查内存占用过高

C:\Users\xxx>go tool pprof http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in C:\Users\xxx\pprof\pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz
Type: inuse_space
Time: Sep 16, 2020 at 4:10pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1GB, 100% of 1GB total
      flat  flat%   sum%        cum   cum%
       1GB   100%   100%        1GB   100%  github.com/wolfogre/go-pprof-practice/animal/muridae/mouse.(*Mouse).Steal
         0     0%   100%        1GB   100%  github.com/wolfogre/go-pprof-practice/animal/muridae/mouse.(*Mouse).Live
         0     0%   100%        1GB   100%  main.main
         0     0%   100%        1GB   100%  runtime.main
(pprof)

输入命令之后,进入交互终端,可以通过help查看支持的命令

(pprof) help
  Commands:
    callgrind        Outputs a graph in callgrind format
    comments         Output all profile comments
    disasm           Output assembly listings annotated with samples
    dot              Outputs a graph in DOT format
    eog              Visualize graph through eog
    evince           Visualize graph through evince
    gif              Outputs a graph image in GIF format
    gv               Visualize graph through gv
    kcachegrind      Visualize report in KCachegrind
    list             Output annotated source for functions matching regexp
    pdf              Outputs a graph in PDF format
    peek             Output callers/callees of functions matching regexp
    png              Outputs a graph image in PNG format
    proto            Outputs the profile in compressed protobuf format
    ps               Outputs a graph in PS format
    raw              Outputs a text representation of the raw profile
    svg              Outputs a graph in SVG format
    tags             Outputs all tags in the profile
    text             Outputs top entries in text form
    top              Outputs top entries in text form
    topproto         Outputs top entries in compressed protobuf format
    traces           Outputs all profile samples in text form
    tree             Outputs a text rendering of call graph
    web              Visualize graph through web browser
    weblist          Display annotated source in a web browser
    o/options        List options and their current values
    quit/exit/^D     Exit pprof

  Options:
    call_tree        Create a context-sensitive call tree
    compact_labels   Show minimal headers
    divide_by        Ratio to divide all samples before visualization
    drop_negative    Ignore negative differences
    edgefraction     Hide edges below <f>*total
    focus            Restricts to samples going through a node matching regexp
    hide             Skips nodes matching regexp
    ignore           Skips paths going through any nodes matching regexp
    mean             Average sample value over first value (count)
    nodecount        Max number of nodes to show
    nodefraction     Hide nodes below <f>*total
    noinlines        Ignore inlines.
    normalize        Scales profile based on the base profile.
    output           Output filename for file-based outputs
    prune_from       Drops any functions below the matched frame.
    relative_percentages Show percentages relative to focused subgraph
    sample_index     Sample value to report (0-based index or name)
    show             Only show nodes matching regexp
    show_from        Drops functions above the highest matched frame.
    source_path      Search path for source files
    tagfocus         Restricts to samples with tags in range or matched by regexp
    taghide          Skip tags matching this regexp
    tagignore        Discard samples with tags in range or matched by regexp
    tagshow          Only consider tags matching this regexp
    trim             Honor nodefraction/edgefraction/nodecount defaults
    trim_path        Path to trim from source paths before search
    unit             Measurement units to display

  Option groups (only set one per group):
    cumulative
      cum              Sort entries based on cumulative weight
      flat             Sort entries based on own weight
    granularity
      addresses        Aggregate at the address level.
      filefunctions    Aggregate at the function level.
      files            Aggregate at the file level.
      functions        Aggregate at the function level.
      lines            Aggregate at the source code line level.
  :   Clear focus/ignore/hide/tagfocus/tagignore

  type "help <cmd|option>" for more information

本文主要使用 top list web 三个命令

(pprof) list Steal
Total: 1GB
ROUTINE ======================== github.com/wolfogre/go-pprof-practice/animal/muridae/mouse.(*Mouse).Steal in E:\goproject\thirdparty\src\github.com\wolfogre\go-pprof-practice\animal\muridae\mouse\mouse.go
       1GB        1GB (flat, cum)   100% of Total
         .          .     45:
         .          .     46:func (m *Mouse) Steal() {
         .          .     47:   log.Println(m.Name(), "steal")
         .          .     48:   max := constant.Gi
         .          .     49:   for len(m.buffer) * constant.Mi < max {
       1GB        1GB     50:           m.buffer = append(m.buffer, [constant.Mi]byte{})
         .          .     51:   }
         .          .     52:}
(pprof)

很明显 m.buffer = append(m.buffer, [constant.Mi]byte{}) 导致了内存占用过高。

启动可视化UI

C:\Users\xxx>go tool pprof -http=":8081" pprof\pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz

UI支持多种视图,包括TOP /Graph /Flame Graph/ Peek /Source 等等,非常直观好用


2.排查CPU占用过高

C:\Users\xxx>go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60
Fetching profile over HTTP from http://localhost:6060/debug/pprof/profile?seconds=60
Saved profile in C:\Users\xxx\pprof\pprof.samples.cpu.003.pb.gz
Type: cpu
Time: Sep 16, 2020 at 4:18pm (CST)
Duration: 1.06mins, Total samples = 42.39s (66.77%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 42.37s, 100% of 42.39s total
Dropped 16 nodes (cum <= 0.21s)
      flat  flat%   sum%        cum   cum%
    42.37s   100%   100%     42.37s   100%  github.com/wolfogre/go-pprof-practice/animal/felidae/tiger.(*Tiger).Eat
         0     0%   100%     42.37s   100%  github.com/wolfogre/go-pprof-practice/animal/felidae/tiger.(*Tiger).Live
         0     0%   100%     42.38s   100%  main.main
         0     0%   100%     42.38s   100%  runtime.main
(pprof) list Eat
Total: 42.39s
ROUTINE ======================== github.com/wolfogre/go-pprof-practice/animal/felidae/tiger.(*Tiger).Eat in E:\goproject\thirdparty\src\github.com\wolfogre\go-pprof-practice\animal\felidae\tiger\tiger.go
    42.37s     42.37s (flat, cum)   100% of Total
         .          .     19:}
         .          .     20:
         .          .     21:func (t *Tiger) Eat() {
         .          .     22:   log.Println(t.Name(), "eat")
         .          .     23:   loop := 10000000000
    42.37s     42.37s     24:   for i := 0; i < loop; i++ {
         .          .     25:           // do nothing
         .          .     26:   }
         .          .     27:}
         .          .     28:
         .          .     29:func (t *Tiger) Drink() {
(pprof)

启动可视化UI

C:\Users\xxx>go tool pprof -http=":8081" pprof\pprof.samples.cpu.003.pb.gz

3.排查频繁内存回收

启动时增加参数 GODEBUG=gctrace=1

gc日志显示出程序正在频繁gc

gc 2 @7.993s 0%: 0+3.9+0 ms clock, 0+0.99/0/0+0 ms cpu, 7->7->6 MB, 8 MB goal, 1 P
gc 3 @7.997s 0%: 0+5.9+0 ms clock, 0+0.99/0/0+0 ms cpu, 14->14->12 MB, 15 MB goal, 1 P
gc 4 @8.002s 0%: 0+9.9+0 ms clock, 0+0/0.99/0+0 ms cpu, 28->28->24 MB, 29 MB goal, 1 P
gc 5 @8.013s 0%: 0+16+0 ms clock, 0+0/0/0+0 ms cpu, 56->56->48 MB, 57 MB goal, 1 P
gc 6 @8.032s 0%: 0+39+0 ms clock, 0+0/0/0+0 ms cpu, 112->112->96 MB, 113 MB goal, 1 P
gc 7 @8.076s 0%: 0+71+0 ms clock, 0+0/0/0+0 ms cpu, 224->224->192 MB, 225 MB goal, 1 P
gc 8 @8.154s 0%: 0+141+0 ms clock, 0+0/0.99/0+0 ms cpu, 448->448->384 MB, 449 MB goal, 1 P
gc 9 @8.309s 0%: 0+284+0 ms clock, 0+0/0/0+0 ms cpu, 896->896->768 MB, 897 MB goal, 1 P
gc 10 @8.621s 0%: 0+561+0 ms clock, 0+0/0/0+0 ms cpu, 1792->1792->1536 MB, 1793 MB goal, 1 P

pprof命令

这里我们使用web命令,web命令会生成一个svg格式的图片,与UI界面上显示的Graph一致。

首次使用时可能会出错,因为需要安装图形化工具graphviz。工具下载地址:

C:\Users\xxx>go tool pprof http://localhost:6060/debug/pprof/allocs
Fetching profile over HTTP from http://localhost:6060/debug/pprof/allocs
Saved profile in C:\Users\xxx\pprof\pprof.alloc_objects.alloc_space.inuse_objects.inuse_space.004.pb.gz
Type: alloc_space
Time: Sep 16, 2020 at 4:31pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
(pprof) top
Showing nodes accounting for 2.01GB, 100% of 2.01GB total
      flat  flat%   sum%        cum   cum%
       2GB 99.22% 99.22%        2GB 99.22%  github.com/wolfogre/go-pprof-practice/animal/muridae/mouse.(*Mouse).Steal
    0.02GB  0.78%   100%     0.02GB  0.78%  github.com/wolfogre/go-pprof-practice/animal/canidae/dog.(*Dog).Run
         0     0%   100%     0.02GB  0.78%  github.com/wolfogre/go-pprof-practice/animal/canidae/dog.(*Dog).Live
         0     0%   100%        2GB 99.22%  github.com/wolfogre/go-pprof-practice/animal/muridae/mouse.(*Mouse).Live
         0     0%   100%     2.01GB   100%  main.main
         0     0%   100%     2.01GB   100%  runtime.main
(pprof) list Steal
Total: 2.01GB
ROUTINE ======================== github.com/wolfogre/go-pprof-practice/animal/muridae/mouse.(*Mouse).Steal in E:\goproject\thirdparty\src\github.com\wolfogre\go-pprof-practice\animal\muridae\mouse\mouse.go
       2GB        2GB (flat, cum) 99.22% of Total
         .          .     45:
         .          .     46:func (m *Mouse) Steal() {
         .          .     47:   log.Println(m.Name(), "steal")
         .          .     48:   max := constant.Gi
         .          .     49:   for len(m.buffer) * constant.Mi < max {
       2GB        2GB     50:           m.buffer = append(m.buffer, [constant.Mi]byte{})
         .          .     51:   }
         .          .     52:}
(pprof) list Run
Total: 2.01GB
ROUTINE ======================== github.com/wolfogre/go-pprof-practice/animal/canidae/dog.(*Dog).Run in E:\goproject\thirdparty\src\github.com\wolfogre\go-pprof-practice\animal\canidae\dog\dog.go
      16MB       16MB (flat, cum)  0.78% of Total
         .          .     38:   log.Println(d.Name(), "pee")
         .          .     39:}
         .          .     40:
         .          .     41:func (d *Dog) Run() {
         .          .     42:   log.Println(d.Name(), "run")
      16MB       16MB     43:   _ = make([]byte, 16 * constant.Mi)
         .          .     44:}
         .          .     45:
         .          .     46:func (d *Dog) Howl() {
         .          .     47:   log.Println(d.Name(), "howl")
         .          .     48:}
(pprof) web
(pprof)


4.排查协程泄露/协程失控

同样使pprof top list web 四个步骤

C:\Users\xxx>go tool pprof http://localhost:6060/debug/pprof/goroutine
Fetching profile over HTTP from http://localhost:6060/debug/pprof/goroutine
Saved profile in C:\Users\xxx\pprof\pprof.goroutine.001.pb.gz
Type: goroutine
Time: Sep 16, 2020 at 4:38pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
(pprof) top
Showing nodes accounting for 45, 100% of 45 total
Showing top 10 nodes out of 36
      flat  flat%   sum%        cum   cum%
        43 95.56% 95.56%         43 95.56%  runtime.gopark
         1  2.22% 97.78%          1  2.22%  net/http.(*connReader).backgroundRead
         1  2.22%   100%          1  2.22%  runtime/pprof.writeRuntimeProfile
         0     0%   100%         40 88.89%  github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Drink.func1
         0     0%   100%          1  2.22%  github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Howl
         0     0%   100%          1  2.22%  github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Howl.func1
         0     0%   100%          1  2.22%  github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Live
         0     0%   100%          1  2.22%  internal/poll.(*FD).Accept
         0     0%   100%          1  2.22%  internal/poll.(*FD).acceptOne
         0     0%   100%          1  2.22%  internal/poll.(*ioSrv).ExecIO
(pprof) list Drink
Total: 45
ROUTINE ======================== github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Drink.func1 in E:\goproject\thirdparty\src\github.com\wolfogre\go-pprof-practice\animal\canidae\wolf\wolf.go
         0         40 (flat, cum) 88.89% of Total
         .          .     29:
         .          .     30:func (w *Wolf) Drink() {
         .          .     31:   log.Println(w.Name(), "drink")
         .          .     32:   for i := 0; i < 10; i++ {
         .          .     33:           go func() {
         .         40     34:                   time.Sleep(30 * time.Second)
         .          .     35:           }()
         .          .     36:   }
         .          .     37:}
         .          .     38:
         .          .     39:func (w *Wolf) Shit() {
(pprof) web
(pprof)


5.排查锁的争用

C:\Users\xxx>go tool pprof http://localhost:6060/debug/pprof/mutex
Fetching profile over HTTP from http://localhost:6060/debug/pprof/mutex
Saved profile in C:\Users\xxx\pprof\pprof.contentions.delay.004.pb.gz
Type: delay
Time: Sep 16, 2020 at 4:52pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2.47mins, 100% of 2.47mins total
      flat  flat%   sum%        cum   cum%
  2.47mins   100%   100%   2.47mins   100%  sync.(*Mutex).Unlock
         0     0%   100%   2.47mins   100%  github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Howl.func1
(pprof) list How1
Total: 2.47mins
(pprof) list func1
Total: 2.47mins
ROUTINE ======================== github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Howl.func1 in E:\goproject\thirdparty\src\github.com\wolfogre\go-pprof-practice\animal\canidae\wolf\wolf.go
         0   2.47mins (flat, cum)   100% of Total
         .          .     53:
         .          .     54:   m := &sync.Mutex{}
         .          .     55:   m.Lock()
         .          .     56:   go func() {
         .          .     57:           time.Sleep(time.Second)
         .   2.47mins     58:           m.Unlock()
         .          .     59:   }()
         .          .     60:   m.Lock()
         .          .     61:}
(pprof)

6. 排查阻塞操作

C:\Users\xxx>go tool pprof http://localhost:6060/debug/pprof/block
Fetching profile over HTTP from http://localhost:6060/debug/pprof/block
Saved profile in C:\Users\xxx\pprof\pprof.contentions.delay.005.pb.gz
Type: delay
Time: Sep 16, 2020 at 4:54pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 5.37mins, 100% of 5.37mins total
Dropped 4 nodes (cum <= 0.03mins)
      flat  flat%   sum%        cum   cum%
  2.69mins 50.01% 50.01%   2.69mins 50.01%  runtime.chanrecv1
  2.68mins 49.99%   100%   2.68mins 49.99%  sync.(*Mutex).Lock
         0     0%   100%   2.68mins 49.99%  github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Howl
         0     0%   100%   2.68mins 49.99%  github.com/wolfogre/go-pprof-practice/animal/canidae/wolf.(*Wolf).Live
         0     0%   100%   2.69mins 50.01%  github.com/wolfogre/go-pprof-practice/animal/felidae/cat.(*Cat).Live
         0     0%   100%   2.69mins 50.01%  github.com/wolfogre/go-pprof-practice/animal/felidae/cat.(*Cat).Pee
         0     0%   100%   5.37mins   100%  main.main
         0     0%   100%   5.37mins   100%  runtime.main
(pprof) list Pee
Total: 5.37mins
ROUTINE ======================== github.com/wolfogre/go-pprof-practice/animal/felidae/cat.(*Cat).Pee in E:\goproject\thirdparty\src\github.com\wolfogre\go-pprof-practice\animal\felidae\cat\cat.go
         0   2.69mins (flat, cum) 50.01% of Total
         .          .     34:}
         .          .     35:
         .          .     36:func (c *Cat) Pee() {
         .          .     37:   log.Println(c.Name(), "pee")
         .          .     38:
         .   2.69mins     39:   <-time.After(time.Second)
         .          .     40:}
         .          .     41:
         .          .     42:func (c *Cat) Climb() {
         .          .     43:   log.Println(c.Name(), "climb")
         .          .     44:}
(pprof) web
(pprof)