The dark side of the runtime

Roberto Clapis

Link to this slide deck: clap.page.link/goroutines

Slides: https://clap.page.link/goroutines

Slides created with golang.org/x/tools/present

2

What do I do

3

Why

4

Scopes

5

A first spin

Let's run some goroutines

package main
import(
"fmt"
"time"
)

func main(){
    for i := 0; i <= 9; i++ {
        go func() {
            fmt.Println(i)
        }()
    }
	time.Sleep(100*time.Millisecond)
}

I was expecting numbers to be printed in a shuffled fashion

2
1
3
4
6
9
8
7
5
6

Wat

7

First pitfall

Go has both closures and goroutines

Closured variables are evaluated when the goroutine is run

package main

import(
	"fmt"
)

func main(){
    freeVar := "Hello "
    f := func(s string){
        fmt.Println(freeVar + s)
    }
    f("Closures")
    freeVar = "Goodbye "
    f("Closures")
}

The iteration variable in a for loop is shared among iterations

for i := range c {    // ← i declared once
    go func(){
      dostuff(i)      // ← used multiple times
    }()
}
8

Why does it always run the same way

Performance reasons:

9

How to find it

Not so easy to catch

It is hard to write a proper static checker:

for /* declare variable */ {
[...]
go func() { /* use variable */ }()
// Wait for goroutine to end
}

or:

for /* declare variable */ {
[...]
go func() { /* use variable */ }()
// a return, a panic, or any other statement that might break out of the loop
}

golang FAQ about this

10

Channels

Channels provide a thread safe way to send messages

This doesn't mean they are immune to the previous issue

func Serve(queue chan *http.Request) {
    for req := range queue {
        go func() {
            respond(req) 
        }()
    }
}

Vulnerability: response to the wrong request

11

The odd fix

package main
import(
	"fmt"
	"time"
)

func main(){
    for i := 0; i <= 9; i++ {
        i := i
        go func() {
            fmt.Println(i)
        }()
    }
	time.Sleep(500*time.Millisecond)
}

Taken from Effective Go:

func Serve(queue chan *Request) {
    for req := range queue {
        req := req // Create new instance of req for the goroutine.
[...]

It may seem odd to write

req := req

but it's legal and idiomatic in Go to do this.

12

The problem

13

Let's start digging

14

The MPG model

15

Checkpoints

16

Runtime is not preemptive

Checkpoints for the scheduler are emitted at compile time

When a goroutine slice of time is up, it is marked as preempted and the control will be given back to the scheduler as soon as a checkpoint is reached.

This means a goroutine is done only when it says it's done

17

What

A checkpoint is a piece of code that is silently added during compilation

It is a point in which your code, instead of executing the next line, invokes the runtime

Garbage Collection is (almost) synchronous

The garbage collector does not prevail on this rule: when the runtime detects a collection is necessary, all goroutines are kindly asked to yield execution. (Stop the world)

18

Guess the output

package main
import(
	"fmt"
	"runtime"
)

func main(){
	var i byte
    go func() {
        for i = 0; i <= 255; i++ {
        }
    }()
    fmt.Println("Dropping mic")
    // Yield execution to force executing other goroutines
    runtime.Gosched()
    runtime.GC()
    fmt.Println("Done")
}
19

Note

The runtime deadlock detector will not detect this, as it is NOT a deadlock.

Code is running and the race detector cannot see the future to check if computation will end.

20

Consequences

If you have and endless computation without message passing, just fix it (duh)

If you just happen to have long computations, beware:
when a garbage collection happens, all other goroutines will stop and wait until the GC restarts the world

21

Some tricks

kill -SIGABRT

out:

goroutine 5 [running]:
main.main.func1(0xc420014088)
    [...] main.go:11
[...]
created by main.main
    [...] main.go:10

goroutine 1 [running]:
    goroutine running on other thread; stack unavailable
22

Fix it

The only way to prevent this kind of behavior is to force a checkpoint to be emitted by the compiler

It is not specified by the standard when checkpoints are emitted, but some checkpoints that will likely never go away are:

[ Please do not rely on this list as it is incomplete and might change in future versions ]

23

Termination and manual scheduling

24

Stop them

Even if we lose all references to variables that communicate with a goroutine it wont be garbage collected, so:

how do you kill a goroutine?

25

Explicit signals

Goroutines have to be explicitly signalled to end and they have to invoke a return statement.

A standard way to do so is to carry around context or done channels and cancel ongoing computation if not needed: this also implicitly inserts a checkpoint.

Parent goroutine:

ctx, cancel := context.WithCancel(ctx)
go func(ctx)
cancel()

Child goroutine:

// select also adds a checkpoint
select {
case <- ctx.Done():
    return
case x <-stuffToDo:
    doStuff(x)
}
26

Simple as that

Let's take a look in the standard library...

http.TimeoutHandler wraps a user provided http.Handler... but how does it close the goroutine spawned at every request?

27

This is not PHP

<?php
set_time_limit(2);
for($i=0;;$i++){
}
?>

// Maximum execution time of
// 2 seconds exceeded
28

Then

One year ago it used to be like this:

// [...] code to create timer
go func() {
        h.handler.ServeHTTP(tw, r)
        // Signal done channel
}()
select {
case <-done:
// Handle HTTP stuff
case <-timeout:
// Write error
}

With no way to communicate termination. When context was added it got more complicated.

Code here.

29

Now (edited to fit slide)

ctx := h.testContext
if ctx == nil {
    var cancelCtx context.CancelFunc
    ctx, cancelCtx = context.WithTimeout(r.Context(), h.dt)
    defer cancelCtx()
}
r = r.WithContext(ctx)
done := make(chan struct{})
go func() {
    h.handler.ServeHTTP(tw, r)
    close(done)
}()
select {
case <-done:
    // handle done
case <-ctx.Done():
    // handle timeout
    return
}

BEWARE: the wrapped handler must now detect context cancellation, this code
cannot ensure nor check if this is the case.

30

Walker

filepath.WalkFunc is used by filepath.Walk to navigate the file system.

As the documentation states: "[...] The files are walked in lexical order, which makes the output deterministic but means that for very large directories Walk can be inefficient."

31

Consequences

every go package providing a WalkFunc has to provide a way to cancel walking

every go package simulating a drive (e.g. go.rice or goftp) has to be used with closured context.

// Obtain cancellable context or just propagate current one
ctx, cancel := context.WithCancel(ctx)

// Context aware WalkFunc
var wf filepath.WalkFunc
wf = func(path string, info os.FileInfo, err error) error {
    // capture context in the closure
}

// Setup timer
t := time.AfterFunc(2 * time.Second, cancel)
defer t.Stop()

// Start a cancellable walk
err := filepath.Walk(dir, wf)
32

Wrapping up

33

Takeaways

Always care about the scope of the variables, try to stay away from closured goroutines as much as possible

Make sure your code has checkpoints: long computations might have a performance impact

The standard library is not magical: it cannot and should not stop a goroutine that is running

Check for cancellation wherever possible. Standard/external libraries might be relying on you to check for cancellation (e.g. the http package)

If it is not possible to check for cancellation, add your own context/done channel

34

Questions and, hopefully, answers

Slides: https://clap.page.link/goroutines

Twitter: Roberto (@empijei) Clapis

35

Useful links

Talks:

Tools:

36

Credits

Gophers by

Tools used:

Mentoring

37

Thank you

Roberto Clapis

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)