Map Reduce

Cpp

#include <atomic>
#include <iostream>
#include <random>
#include <string>
#include <thread>
#include <vector>

using namespace std;
using vec_iter = std::vector<int>::iterator;

int main() {
    random_device rd;
    mt19937 gen(rd());
    uniform_int_distribution<> dis(0, 100);

    auto nums = vector<int>(4096);
    for (auto i = 0; i < nums.size(); i++) {
        nums[i] = dis(gen);
    }

    atomic<int> totalSum;
    vector<thread> threads;

    // divide into 1024 blocks
    int from = 0;
    int to = 1024;
    while (to < nums.size()) {
        from += 1024;
        to += 1024;
        threads.push_back(
            thread([&totalSum,
                    start = nums.begin() + from,
                    end   = nums.begin() + to   ]() -> void {
                int sum = 0;
                for (vec_iter cur = start; cur != end; cur++) {
                    sum += *cur;
                }
                totalSum.fetch_add(sum);
            })
        );
    }

    for (auto &t: threads) {
        t.join();
    }

    cout << "Total Sum: " << totalSum.load() << "\n";
}

Go

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func main() {
	rand.Seed(time.Now().UnixNano())

	nums := make([]int, 4096)
	for i := range nums {
		nums[i] = rand.Intn(100)
	}

	results := make(chan int, 4)

	// divide into 1024 blocks
	from := 0
	to := 1024
	jobs := 0
	for to < len(nums) {
		go func() {
			results <- sum(nums[from:to])
		}()
		from += 1024
		to += 1024
		jobs += 1
	}

	// collect all of the results
	totalSum := 0
	for jobs > 0 {
		totalSum += <-results
		jobs -= 1
	}

	fmt.Printf("Total Sum: %d\n", totalSum)
}

func sum(nums []int) int {
	sum := 0
	for _, n := range nums {
		sum += n
	}
	return sum
}

What This Code Does

This code splits a task (summing numbers in a sequence) amongst multiple units of concurrency (threads/goroutines) and then collects the results using idiomatic approaches.

What's the Same

Both approaches use a random number generator to populate the initial input-data. Both approaches also use a slice or view when handing bits of the input data to the concurrent sub-routine. That is, both implementations avoid explicit copies when passing data.

Lastly, both implementations wait for the concurrent sub-routines to finish before displaying the final result. This is done in Go by reading from a channel N times, where N represents the number of goroutines created. C++ does this by calling join() on the thread object, which waits for the thread to exit before continuing.

What's Different

The first noticeable difference, although not the main point of this versus, is the random number generation. See the later section Digging Deeper: Random Number Generation if this most interests you. The main semantic differences lie in how we collect and combine (reduce) the results from the concurrent operations.

Again, it is idiomatic in Go to pass results via channels. As such our "reduction" of results happens within the main goroutine (summing all individual results). C++, on the other hand, uses an atomic<int>, which allows each thread to participate in the "reduction" by atomically adding their results to the totalSum object. Atomics in C++ rely on the compiler to efficiently implement for the target platform you're compiling too. That is, some CPUs have the ability to support atomic operations at the instruction level. For CPUs that do not the compiler may result to other methods, such as locks, to implement atomic operations.

The second difference is how we represent "slices" of arrays; that is a view of an array that doesn't involve performing copies. Go uses the type []T to represent slices and slices can also be "sliced" using the [M:N] notation. I would consider this "first class" support for the concept of slices. C++ on the other hand achieves this through the use of iterators.

Digging Deeper: C++ Slices (AKA Iterators)

An iterator in C++ is a cursor object that points to a particular position in a data-structure. In our case, it is pointing to a position in a std::vector. In many respects the cursor acts like a pointer. We can advance the position by incrementing, such as cur++. We get the value by dereferencing *cur. We ensure that we do not advance the iterator beyond the end of the data-structure by comparing it to a stop position, end. However, since we are working with a fixed-size vector, this is easy for us to ensure. If we were iterating over a dynamically sized vector, our code would look like:

for (auto curr = my_vec.begin(); curr++; curr != my_vec.end()) {
  // process items
}

However, if we were going to iterate the entire array and perform the same operation for each loop, we could also write this code as:

for (auto &val: my_vec) {
  // process items
}

While the two abstractions look similar, the second method is called a "ranged for loop" and was introduced in C++ 11. This form is the most convenient, the most limiting, and most often seen when operating over all elements within a loop. The first approach, using iterators, is the most flexible option for when a simple for loop is not all that you need. With the iterator you could:

  • Pass the iterator to a function as a slice (like our example above)
  • Increment the iterator only under certain conditions and possibly re-processing an item multiple times within a loop
  • Process every other element within a loop (by incrementing the cursor by 2: cur += 2)
  • Process a collection backwards using the reverse iterators (rbegin() and rend())
Digging Deeper: Random Number Generation

The Go implementation offers a random-number generator similar to many high-level languages. However C++, in true C++ fashion, provides some lower level abstractions and gives you more control over how your random numbers are generated.

random_device
Generates a random numeric value from hardware entropy. The exact method of how this is generated is compiler dependent (e.g. Clang uses /dev/urandom on Linux, while GCC uses Intel's rdrand instruction). It is also possible that your platform doesn't support a way to generate random numbers from hardware entropy, in which case this type would not be constructable. A value from a random_device is often used as a seed value to any of the stdlib random number generators.
mt19937

A random number generator based on the Mersenne Twister algorithm. The C++ standard library provides a decently large selection of random number generators to use, and depending on your application you might care deeply about which one is used. The list of random-number algorithms is:

  • minstd_rand0
  • minstd_rand
  • mt19937
  • mt19937_64
  • ranlux24_base
  • ranlux48_base
  • ranlux24
  • ranlux48
  • knuth_b
  • default_random_engine

For more information on what the differences are, you can view the reference page for the <random> header here.

uniform_int_distribution

Allows us to specify a range in which all numbers in that range are equally likely to be returned as a random number. While not responsible for generating random numbers, this allows us to specify ranges of random numbers we'd like to receive which is different from the range of random numbers being generated, while not losing "randomness" of values.

Alternatively, other distributions allow us to shift the probabilities of getting certain numbers, such as an Exponential Distribution, which may bias number generation to one side of the range over time.

At the time of writing, there are 20 distributions defined in the <random> header. You can see the full list of standard-library distributions here.

Fork me on GitHub