Thursday, June 9, 2011

Goroutines

Goroutines allow you to execute tasks in parallel - there are many connotations to the word ‘parallel’ in computing, so, take that with a pinch of salt. Parallel for example could mean the same program running on multiple hardware chips, on multiple machines, on multiple threads within the same machine, or multiplexed on the same thread. The Go documentation says "Goroutines are multiplexed onto multiple OS threads so if one should block, such as while waiting for I/O, others continue to run". Effectively for us goroutines hides many of the internal machine complexities in achieving parallelism. This also means that the language designers could implement changes to how goroutines scale on the machine taking advantage of hardware and CPU improvements.

How is this expected to be different from programs that we normally do? In most programs that we use, the processing is sequential. Especially when you consider slow human speed of mechanical operations and high speed of computer operations, the serial processing done by computers does not affect usage. For example when you type on your keyboard, your typing is much slower than the speed at which the computer can process it - so it is ok that the program is sequential/serial. In certain other systems, this might not be acceptable - say a web server supporting thousands or millions of users. These servers typically have high computing power and serializing the responses to each of the users is ineffective.

Let’s take some simple examples that will help us illustrate and learn parallel computing with goroutines. Imagine the simulation of an athletic meet. There are many events occurring at the same time - high jump, long jump, the 100 metres sprint, etc. We don’t want to wait for the end of one event simulation before starting the next event.



Goroutines in go are similar to normal functions, except that you attach a go in front of the actual function call. If my_function() was a method, then to execute it as a goroutine, we would do go my_function() - that would make it run as a parallel process. Let’s do a simple example to see that even if it does not usefully employ parallel processing.

Partial code
func add2Numbers(a, b int) { 
    fmt.Println( a + b )
}

func main() {
 add2Numbers(1, 2) //a normal function call
 
 go add2Numbers(1, 2) //a normal function executed parallely as a goroutine
}

Now let’s implement the simulation of the athletic event. We will have one method that claims that it is executing an event, and we will simulate the time consumed by that event by performing a Sleep within it. Sleeping within an athletic event? Well ok, not the best choice I know, but just run along with me on this one, alright? In the first version of the code, we shall run it as a normal function call so that we can compare it against the version when we run it parallely as goroutines.

Full code
package main

import (
    "fmt"
    "time"
)

func simulateEvent(name string, timeInSecs int64) { 
    // sleep for a while to simulate time consumed by event
    fmt.Println("Started ", name, ": Should take", timeInSecs, "seconds.")
    time.Sleep(timeInSecs * 1e9 )
    fmt.Println("Finished ", name)
}

func main() {
    simulateEvent("100m sprint", 10) //start 100m sprint, it should take 10 seconds
    simulateEvent("Long jump", 6) //start long jump, it should take 6 seconds
    simulateEvent("High jump", 3) //start high jump, it should take 3 seconds
}

Started 100m sprint : Should take 10 seconds.
Finished 100m sprint
Started Long jump : Should take 6 seconds.
Finished Long jump
Started High jump : Should take 3 seconds.
Finished High jump

As you can see in the output, each event is sequential. The first one had to finish before the next one started. Now let us redo it with goroutines - notice that it is as simple as adding the keyword go at the beginning of each function call.

Full code
package main

import (
    "fmt"
    "time"
)

func simulateEvent(name string, timeInSecs int64) { 
    // sleep for a while to simulate time consumed by event
    fmt.Println("Started ", name, ": Should take", timeInSecs, "seconds.")
    time.Sleep(timeInSecs * 1e9 )
    fmt.Println("Finished ", name)
}

func main() {
    go simulateEvent("100m sprint", 10) //start 100m sprint, it should take 10 seconds
    go simulateEvent("Long jump", 6) //start long jump, it should take 6 seconds
    go simulateEvent("High jump", 3) //start high jump, it should take 3 seconds

    //so that the program doesn't exit here, we make the program wait here for a while
    time.Sleep(12 * 1e9)
}

Started Long jump : Should take 6 seconds.
Started 100m sprint : Should take 10 seconds.
Started High jump : Should take 3 seconds.
Finished High jump
Finished Long jump
Finished 100m sprit

As you can see from the output, and comparing it to the one where we ran it sequentially, the processing is now parallel. The long jump event didn’t wait for the 100m sprint to finish, nor did the high jump event wait for any of the others.

One additional point to note is that goroutines cannot exist if the main thread of the program has exited - in the above program if we did not add the time.Sleep(12 * 1e9) at the end to wait for 12 seconds, the program would have dropped of the end thus ending all the goroutines also. In programs like web services, the main program will always be alive and waiting for any users to connect.

There is one weird result in the above output. We have invoked the 100m sprint first, but the print out first starts with the long jump. I ran it repeatedly and got the same result. I don’t exactly know why it is so. One important learning from this anyways is not to code dependent on when a goroutine will actually be started. Instead they should be independent units of execution, communicating through go channels.
It might be that the 100m sprint execution started first but that its printing took a wee bit longer than the other one also and that the two goroutine instances started in the order they were invoked in. In any case, let us not depend on that.
I have reproduced it as it was on my Ubuntu 10.4 running as a virtual machine within a Windows 7 OS.

A note of caution on parallel computing. There are many complexities to consider - like what happens when different processes work on the same data which might end up corrupting it; how do we reliably communicate with separate parallel processes; how is sending and receiving data in proper order managed? There aren’t easy solutions to some of these and Go does not guarantee that it will provide solutions to everything - but it does to a few and for the others it provides good guidance on principles to be adopted. In other sections we shall look at some of the issues and their solutions.

6 comments:

  1. With Go ver. 1, the line
    time.Sleep(timeInSecs * 1e9 )
    should be
    time.Sleep(time.Duration(timeInSecs * 1e9))

    ReplyDelete
    Replies
    1. Even better is:
      time.Sleep(time.Duration(timeInSecs) * time.Second)

      Delete
    2. I second this. The method was made to be used with the time units (time.Second, time.Millisecond, etc.). Using simple constants can be unclear.

      Delete
  2. The add2Numbers code won't work (only the first add2Numbers call will be made) because func main () ends and all scheduled goroutines are killed. You should put a time.Sleep(1) statement after the go add2Numbers statement.

    ReplyDelete
  3. why there is no parallel execution on GAE ?

    ReplyDelete
  4. The order of printing depends on when the goroutine 'wakes up' and is given access to the CPU. When high jump wakes at the 3 second mark, it is given access to the CPU and prints the message. Long jump wakes up after 6, prints its message. Finally sprint wakes up after 10, and prints its message.

    ReplyDelete

If you think others also will find these tutorials useful, kindly "+1" it above and mention the link in your own blogs, responses, and entries on the net so that others also may reach here. Thank you.

Note: Only a member of this blog may post a comment.