Graceful Restart in Golang

JUN 3RD, 2014 | COMMENTS

Update (Apr 2015): Florian von Bock has turned what is described in this article into a nice Go package called endless.

If you have a Golang HTTP service, chances are, you will need to restart it on occasion to upgrade the binary or change some configuration. And if you (like me) have been taking graceful restart for granted because the webserver took care of it, you may find this recipe very handy because with Golang you need to roll your own.

There are actually two problems that need to be solved here. First is the UNIX side of the graceful restart, i.e. the mechanism by which a process can restart itself without closing the listening socket. The second problem is ensuring that all in-progress requests are properly completed or timed-out.

Restarting without closing the socket

  • Fork a new process which inherits the listening socket.
  • The child performs initialization and starts accepting connections on the socket.
  • Immediately after, child sends a signal to the parent causing the parent to stop accepting connecitons and terminate.

Forking a new process

ExtraFiles

Here is what this looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
file := netListener.File() // this returns a Dup()
path := "/path/to/executable"
args := []string{
    "-graceful"}

cmd := exec.Command(path, args...)
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.ExtraFiles = []*os.File{file}

err := cmd.Start()
if err != nil {
    log.Fatalf("gracefulRestart: Failed to launch, error: %v", err)
}
netListenerpath
netListener.File()FD_CLOEXEC
ExtraFiles
args-graceful

Child initialization

Here is part of the program startup sequence

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
    server := &http.Server{Addr: "0.0.0.0:8888"}

    var gracefulChild bool
    var l net.Listever
    var err error

    flag.BoolVar(&gracefulChild, "graceful", false, "listen on fd open 3 (internal use only)")

    if gracefulChild {
        log.Print("main: Listening to existing file descriptor 3.")
        f := os.NewFile(3, "")
        l, err = net.FileListener(f)
    } else {
        log.Print("main: Listening on a new file descriptor.")
        l, err = net.Listen("tcp", server.Addr)
    }

Signal parent to stop

At this point we’re ready to accept requests, but just before we do that, we need to tell our parent to stop accepting requests and exit, which could be something like this:

1
2
3
4
5
6
7
if gracefulChild {
    parent := syscall.Getppid()
    log.Printf("main: Killing parent pid: %v", parent)
    syscall.Kill(parent, syscall.SIGTERM)
}

server.Serve(l)

In-progress requests completion/timeout

For this we will need to keep track of open connections with a sync.WaitGroup. We will need to increment the wait group on every accepted connection and decrement it on every connection close.

1
var httpWg sync.WaitGroup

At first glance, the Golang standard http package does not provide any hooks to take action on Accept() or Close(), but this is where the interface magic comes to the rescue. (Big thanks and credit to Jeff R. Allen for this post).

net.Listenerstopstopped
1
2
3
4
5
type gracefulListener struct {
    net.Listener
    stop    chan error
    stopped bool
}
gracefulConn
1
2
3
4
5
6
7
8
9
10
11
func (gl *gracefulListener) Accept() (c net.Conn, err error) {
    c, err = gl.Listener.Accept()
    if err != nil {
        return
    }

    c = gracefulConn{Conn: c}

    httpWg.Add(1)
    return
}

We also need a “constructor”:

1
2
3
4
5
6
7
8
9
func newGracefulListener(l net.Listener) (gl *gracefulListener) {
    gl = &gracefulListener{Listener: l, stop: make(chan error)}
    go func() {
        _ = <-gl.stop
        gl.stopped = true
        gl.stop <- gl.Listener.Close()
    }()
    return
}
Accept()gl.Listener.Accept()
Close()nil
1
2
3
4
5
6
7
func (gl *gracefulListener) Close() error {
    if gl.stopped {
        return syscall.EINVAL
    }
    gl.stop <- nil
    return <-gl.stop
}
net.TCPListener
1
2
3
4
5
func (gl *gracefulListener) File() *os.File {
    tl := gl.Listener.(*net.TCPListener)
    fl, _ := tl.File()
    return fl
}
net.ConnClose()
1
2
3
4
5
6
7
8
type gracefulConn struct {
    net.Conn
}

func (w gracefulConn) Close() error {
    httpWg.Done()
    return w.Conn.Close()
}
server.Serve(l)
1
2
netListener = newGracefulListener(l)
server.Serve(netListener)

And there is one more thing. You should avoid hanging connections that the client has no intention of closing (or not this week). It is better to create your server as follows:

1
2
3
4
5
server := &http.Server{
        Addr:           "0.0.0.0:8888",
        ReadTimeout:    10 * time.Second,
        WriteTimeout:   10 * time.Second,
        MaxHeaderBytes: 1 << 16}

Posted by Gregory Trubetskoy Jun 3rd, 2014