大话golang性能分析（一）：profile基本原理

O 专题目标

理解profile基本原理
熟悉go常用性能分析工具pprof
快速对线上服务的cpu、内存、goroutine的问题进行分析和排查

对性能分析，golang是采取采样分析的方式，语言原生支持对运行的程序进行采样，收集采样数据通过累加计算并通过提供相应的工具来分析堆栈信息。【对比 java ：java通过发布jdk周边工具jps jmap j Stack 等，在问题分析前先将内存中的数据进行dump的方式进行分析】

一 profile原理

A Profile is a collection of stack traces showing the call sequences that led to instances of a particular event, such as allocation. Packages can create and maintain their own profiles; the most common use is for tracking resources that must be explicitly closed, such as files or network connections.

go默认会初始化六种profile, 每种profile存储的实际内容被抽象为 countProfile,存储栈帧地址，堆栈地址通过调用 runtime .CallersFrames(stk)可以获取堆栈信息，下面主要讲最常用的三种方式：内存、CPU和协程

type countProfile interface { 
 Len() int
 Stack(i int) []uintptr
}
// A countProfile is a set of  stack  traces to be printed as counts grouped by stack trace. There are multiple implementations:all that matters is that we can find out how many traces there are and obtain each trace in turn.

内存采样

go程序启动后，runtime会按照一定频率对内存的分配进行采样记录，当内存分配每达到一定值（默认是512KB，参数由runtime.MemProfileRate设定）, runtime就会记录下当前这次内存分配的大小、stack等信息到profile

type MemProfileRecord struct {
 AllocBytes, FreeBytes int64 // number of bytes allocated, freed
 AllocObjects, FreeObjects int64 // number of objects allocated, freed
 Stack0 [32]uintptr // stack trace for this record; ends at first 0 entry
}

CPU采样

cpu的采样是通过调用函数StartCPUProfile来启动采样的，结束调用StopCPUProfile调用链如下StartCPUProfile->runtime.SetCPUProfileRate->sighandler；采样频率是100hz

// The runtime routines allow a variable profiling rate,
// but in practice operating systems cannot trigger signals
// at more than about 500 Hz, and our processing of the
// signal is not cheap (mostly getting the stack trace).
// 100 Hz is a reasonable choice: it is frequent enough to
// produce useful data, rare enough not to bog down the
// system, and a nice round number to make it easy to
// convert sample counts to seconds. Instead of requiring
// each client to specify the frequency, we hard code it.
const hz = 100
// readProfile, provided by the runtime, returns the next chunk of
// binary CPU profiling stack trace data, blocking until data is available.
// If profiling is turned off and all the profile data accumulated while it was
// on has been returned, readProfile returns eof=true.
// The caller must save the returned data and tags before calling readProfile again.
func readProfile() (data []uint64, tags []unsafe.Pointer, eof bool)

goroutine采样

GMP模型中，G goroutine P context M thread，采样数据来源于P，运行中的协程上下文堆栈。P会维护当前执行队列，队列中是M对应的G队列。自行检索GPM原理（关键字：窃取、61分之一全局队列）

如何将采样内容汇总？

假设我们在程序执行的某个时刻取样得到一个栈帧序列是ABC，可以得到的信息包括：此刻运行的函数是C，是从函数B调用到C的。当取样很多次后进行统计，就可以得到调用的信息。比如对下面这段代码的取样：

void A() { B(); for (int i=0; i<3; i++) C(); } void B() { for (int i=0; i<5; i++) C(); }
将得到
A AB ABC ABC ABC AC 根据统计信息: 函数累加耗时和调用关系，根据这些数据可以构造有向图

profiles采样后会以pprof-formatted格式存储起来，分析这些数据需要用到golang的分析工具。

下一节分享golang常用的分析工具，敬请期待。欢迎关注”大龄码农”,一起来探讨技术背后的原理！