1 问题引入

示例代码

package main
 
func main() {
    done := false
    go func() {
        done = true
    }()
 
    for !done {
    }
    println("main exit")
}

新版本运行结果:

golang version 1.15

[test]$ /data/golang/lzj/go/bin/go version
go version go1.15.5 linux/amd64
[test]$ /data/golang/lzj/go/bin/go build test.go
[test]$ GOMAXPROCS=1 ./test
main exit
[test]$ 

旧版本运行结果:

golang version 1.12

[test]$ /data/golang/lzj/go/bin/go version
go version go1.12.7 linux/amd64
[test]$ /data/golang/lzj/go/bin/go build test.go
[test]$ GOMAXPROCS=1 ./test
  C-c C-c
[test]$ 

通过对比可以看出,旧版本代码会导致调度器死锁,而新版本则避免了这个问题。golang v1.14 Runtime引入了goroutine异占抢占,避免了潜在的调度器死锁。

2 源码剖析

新版本引入了信号sigPreempt来解决无函数调用的循环可能导致的调度死锁。

2.1 信号选择及依据

// sigPreempt is the signal used for non-cooperative preemption.
//
// There's no good way to choose this signal, but there are some
// heuristics:
//
// 1. It should be a signal that's passed-through by debuggers by
// default. On Linux, this is SIGALRM, SIGURG, SIGCHLD, SIGIO,
// SIGVTALRM, SIGPROF, and SIGWINCH, plus some glibc-internal signals.
//
// 2. It shouldn't be used internally by libc in mixed Go/C binaries
// because libc may assume it's the only thing that can handle these
// signals. For example SIGCANCEL or SIGSETXID.
//
// 3. It should be a signal that can happen spuriously without
// consequences. For example, SIGALRM is a bad choice because the
// signal handler can't tell if it was caused by the real process
// alarm or not (arguably this means the signal is broken, but I
// digress). SIGUSR1 and SIGUSR2 are also bad because those are often
// used in meaningful ways by applications.
//
// 4. We need to deal with platforms without real-time signals (like
// macOS), so those are out.
//
// We use SIGURG because it meets all of these criteria, is extremely
// unlikely to be used by an application for its "real" meaning (both
// because out-of-band data is basically unused and because SIGURG
// doesn't report which socket has the condition, making it pretty
// useless), and even if it is, the application has to be ready for
// spurious SIGURG. SIGIO wouldn't be a bad choice either, but is more
// likely to be used for real.
const sigPreempt = _SIGURG

这里信号复用了SIGURG。

2.2 信号发送

goroutine的检测以及信号的发送如下函数调用链所示:

 sysmon
    retake
        preemptone
            preemptM
                signalM

最终signalM将信号发给运行着M的线程。

func signalM(mp *m, sig int) {
    tgkill(getpid(), int(mp.procid), sig)
} 

2.3 信号接收

2.3.1 信号注册

libpreinit
    initsig
        setsig
            sigaction
                rt_sigaction

函数 sigaction 将注册信号的处理例程。注册时的各项参数由setsig指定,如下所示。

func setsig(i uint32, fn uintptr) {
    var sa sigactiont
    sa.sa_flags = _SA_SIGINFO | _SA_ONSTACK | _SA_RESTORER | _SA_RESTART
    sigfillset(&sa.sa_mask)
    // Although Linux manpage says "sa_restorer element is obsolete and
    // should not be used". x86_64 kernel requires it. Only use it on
    // x86.
    if GOARCH == "386" || GOARCH == "amd64" {
        sa.sa_restorer = funcPC(sigreturn)
    }
    if fn == funcPC(sighandler) {
        if iscgo {
            fn = funcPC(cgoSigtramp)
        } else {
            fn = funcPC(sigtramp)
        }
    }
    sa.sa_handler = fn
    sigaction(i, &sa, nil)
}

函数 sigreturn 主要用于恢复进程的上下文。

2.3.2 信号处理

sigtramp
    sigtrampgo
        sighandler
            doSigPreempt

函数 sighandler 对sigPreempt做检测

func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
    _g_ := getg()
    c := &sigctxt{info, ctxt}
    if sig == sigPreempt && debug.asyncpreemptoff == 0 {
        // Might be a preemption signal.
        doSigPreempt(gp, c)
        // Even if this was definitely a preemption signal, it
        // may have been coalesced with another signal, so we
        // still let it through to the application.
    }
}

函数 doSigPreempt 只是修改了内核传入的context中的EIP的值,并没有作真正的抢占。

func doSigPreempt(gp *g, ctxt *sigctxt) {
    // Check if this G wants to be preempted and is safe to
    // preempt.
    if wantAsyncPreempt(gp) {
        if ok, newpc := isAsyncSafePoint(gp, ctxt.sigpc(), ctxt.sigsp(), ctxt.siglr()); ok {
            // Adjust the PC and inject a call to asyncPreempt.
            ctxt.pushCall(funcPC(asyncPreempt), newpc)
        }
    }
}

整个调用链结束后,sigreturn函数会被调用,用于恢复线程的上下文件,由于EIP现指向了asyncPreempt,因此线程在处理完信号后,会执行asyncPreempt,而非从被中断的代码开始执行。当然,asyncPreempt返回后,会从被中断的代码开始执行,即从newpc开始执行。

asyncPreempt
    asyncPreempt2
        mcall(gopreempt_m)
            goschedImpl
                schedule
                    execute
                        gogo

可以看出,asyncPreempt最终调用gogo执行新的goroutine。mcall将会保存当前goroutine的context。当前goroutine再次会被调度时,将恢复mcall保存的context。最终将恢复执行newpc。