Process Coordination


#include <atomic>
#include <iostream>
#include <thread>
#include <string>
#include <vector>

int main() {
    std::string msg = "Hello, World!";
    std::atomic_int index = -1;

    std::vector<std::thread> threads;
    for (auto i = 0; i < msg.length(); i++) {
        threads.push_back(std::thread([ii = i, c = msg[i], &index]() {
            while ((ii - 1) != index.load()) {
                // busy wait
            std::cout << c;

    for (auto &t: threads) {


package main

import (

func main() {
	broadcastRegister := make(chan chan int, 1)
	broadcastIn := make(chan int, 5)
	go channelBroadcaster(broadcastRegister, broadcastIn)

	var wg sync.WaitGroup

	for i, c := range "Hello, World!" {
		workerIn := make(chan int, 20)
		broadcastRegister <- workerIn
		go coordinatedPrint(i, string(c), workerIn, broadcastIn, &wg)

	broadcastIn <- -1

func channelBroadcaster(register chan chan int, in chan int) {
	outs := make([]chan int, 0, 10)
	for {
		select {
		case out := <- register:
			outs = append(outs, out)
		case index := <- in:
			for _, out := range outs {
				out <- index

func coordinatedPrint(index int, char string, in chan int,
					  out chan int, wg *sync.WaitGroup) {
	defer wg.Done()
	for {
		prev_i := <- in
		if (index - 1) == prev_i {
			out <- index

What This Code Does

Given the string "Hello, World!" each program must create as many concurrent units (threads for C++, goroutines for Go) as there are characters. Then the threads must coordinate (without the help of the main thread) to print out the characters in order. The concurrent units may receive, at a minimum, the character they are responsible for as well as the index of the character to print.

What's the Same

Both implementations use a similar approach (for the sake of comparison) to solve the problem. Each individual concurrency unit waits for the index to be one-less than the index they hold. That is to say, they are next in line to print out their character.

Also, both solutions wait, in the main function, for all concurrency units to finish their processing. In the Go code, this is done with a sync.WaitGroup and in the C++ code the join() method is called on the threads.

What's Different

A LOT. Let's break it down.

The first thing to understand, before discussing code, is that the philosophies differ in the C++ and the Go community around concurrency. One of the creators of Go has been attributed with the following quote:

Don't communicate by sharing memory; share memory by communicating. — Rob Pike

To expand on this quote, in Go, it is more idiomatic to share state through the use of c)hannels. What is sent over the channels should not be shared between goroutines, that is to say data sent over channels is copied such that two goroutines do not point to the same location in memory. This share-nothing approach is a common way to avoid data-races when dealing with concurrent programming.

While these ideas are sound, they are not always typical in C++. Often, an application is written in C++ because performance is very important. When this is the case, and concurrent computation is involved, it is often the case that C++ code will share memory across threads to avoid the price in copying data.

Breaking Down Go

Because the go example aims to coordinate through message-passing, we make use of channels. Channels can be thought of as synchronized queues. However channels do not, by default, support advanced queueing patterns such as fan-out. Since every goroutine that is printing needs to communicate with all other goroutines, we have to implement a simple broadcaster which will fan-out our messages for us (channelBroadcaster).

The use of a WaitGroup is used to allow the main goroutine to wait for all the other goroutines to exit. In this case, we are sharing memory by passing a pointer to the WaitGroup to the goroutine. However, even though Go strives to avoid sharing memory, WaitGroup comes from the Go stdlib's sync package and is considered safe (as well as idiomatic).

Breaking Down C++

The C++ solution shares memory of the current print index by using a std::atomic_int. Each thread is launched via a lambda that captures the index, the current character, and a reference to the index (references are similar to pointers in that they allow sharing of memory). Each thread performs a busy-wait as long as the current value of the atomic integer is not what they are looking for (index.fetch() retrieves the current value of the int).

Once the main thread as launched all of the threads, it simply waits for all of the threads to exit by calling join() on the thread handles.

Some Conclusions

While sharing memory, for this particular (and contrived) example, does result in less code; the Go solution is arguably more efficient in that at no point are the goroutines spinning waiting for a message. The underlying Go scheduler is able to sleep the goroutine and wake it up when there is a message available on the channel. How does that work? Well, reading from the channel is a blocking operation, one which the Go scheduler is able to efficiently handle and not waist CPU resources waiting.

Fork me on GitHub