1 问题引入
示例代码
package main
func main() {
done := false
go func() {
done = true
}()
for !done {
}
println("main exit")
}
新版本运行结果:
golang version 1.15
[test]$ /data/golang/lzj/go/bin/go version
go version go1.15.5 linux/amd64
[test]$ /data/golang/lzj/go/bin/go build test.go
[test]$ GOMAXPROCS=1 ./test
main exit
[test]$
旧版本运行结果:
golang version 1.12
[test]$ /data/golang/lzj/go/bin/go version
go version go1.12.7 linux/amd64
[test]$ /data/golang/lzj/go/bin/go build test.go
[test]$ GOMAXPROCS=1 ./test
C-c C-c
[test]$
通过对比可以看出,旧版本代码会导致调度器死锁,而新版本则避免了这个问题。golang v1.14 Runtime引入了goroutine异占抢占,避免了潜在的调度器死锁。
2 源码剖析
新版本引入了信号sigPreempt来解决无函数调用的循环可能导致的调度死锁。
2.1 信号选择及依据
// sigPreempt is the signal used for non-cooperative preemption.
//
// There's no good way to choose this signal, but there are some
// heuristics:
//
// 1. It should be a signal that's passed-through by debuggers by
// default. On Linux, this is SIGALRM, SIGURG, SIGCHLD, SIGIO,
// SIGVTALRM, SIGPROF, and SIGWINCH, plus some glibc-internal signals.
//
// 2. It shouldn't be used internally by libc in mixed Go/C binaries
// because libc may assume it's the only thing that can handle these
// signals. For example SIGCANCEL or SIGSETXID.
//
// 3. It should be a signal that can happen spuriously without
// consequences. For example, SIGALRM is a bad choice because the
// signal handler can't tell if it was caused by the real process
// alarm or not (arguably this means the signal is broken, but I
// digress). SIGUSR1 and SIGUSR2 are also bad because those are often
// used in meaningful ways by applications.
//
// 4. We need to deal with platforms without real-time signals (like
// macOS), so those are out.
//
// We use SIGURG because it meets all of these criteria, is extremely
// unlikely to be used by an application for its "real" meaning (both
// because out-of-band data is basically unused and because SIGURG
// doesn't report which socket has the condition, making it pretty
// useless), and even if it is, the application has to be ready for
// spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
// likely to be used for real.
const sigPreempt = _SIGURG
这里信号复用了SIGURG。
2.2 信号发送
goroutine的检测以及信号的发送如下函数调用链所示:
sysmon
retake
preemptone
preemptM
signalM
最终signalM将信号发给运行着M的线程。
func signalM(mp *m, sig int) {
tgkill(getpid(), int(mp.procid), sig)
}
2.3 信号接收
2.3.1 信号注册
libpreinit
initsig
setsig
sigaction
rt_sigaction
函数 sigaction 将注册信号的处理例程。注册时的各项参数由setsig指定,如下所示。
func setsig(i uint32, fn uintptr) {
var sa sigactiont
sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTORER | _SA_RESTART
sigfillset(&sa.sa_mask)
// Although Linux manpage says "sa_restorer element is obsolete and
// should not be used". x86_64 kernel requires it. Only use it on
// x86.
if GOARCH == "386" || GOARCH == "amd64" {
sa.sa_restorer = funcPC(sigreturn)
}
if fn == funcPC(sighandler) {
if iscgo {
fn = funcPC(cgoSigtramp)
} else {
fn = funcPC(sigtramp)
}
}
sa.sa_handler = fn
sigaction(i, &sa, nil)
}
函数 sigreturn 主要用于恢复进程的上下文。
2.3.2 信号处理
sigtramp
sigtrampgo
sighandler
doSigPreempt
函数 sighandler 对sigPreempt做检测
func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
_g_ := getg()
c := &sigctxt{info, ctxt}
if sig == sigPreempt && debug.asyncpreemptoff == 0 {
// Might be a preemption signal.
doSigPreempt(gp, c)
// Even if this was definitely a preemption signal, it
// may have been coalesced with another signal, so we
// still let it through to the application.
}
}
函数 doSigPreempt 只是修改了内核传入的context中的EIP的值,并没有作真正的抢占。
func doSigPreempt(gp *g, ctxt *sigctxt) {
// Check if this G wants to be preempted and is safe to
// preempt.
if wantAsyncPreempt(gp) {
if ok, newpc := isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()); ok {
// Adjust the PC and inject a call to asyncPreempt.
ctxt.pushCall(funcPC(asyncPreempt), newpc)
}
}
}
整个调用链结束后,sigreturn函数会被调用,用于恢复线程的上下文件,由于EIP现指向了asyncPreempt,因此线程在处理完信号后,会执行asyncPreempt,而非从被中断的代码开始执行。当然,asyncPreempt返回后,会从被中断的代码开始执行,即从newpc开始执行。
asyncPreempt
asyncPreempt2
mcall(gopreempt_m)
goschedImpl
schedule
execute
gogo
可以看出,asyncPreempt最终调用gogo执行新的goroutine。mcall将会保存当前goroutine的context。当前goroutine再次会被调度时,将恢复mcall保存的context。最终将恢复执行newpc。