The First Dive in Multi-Threaded Patterns

Prefix

Here’s a little side note from Chapter 6 - Multithreaded Patterns in this book: Multithreaded JavaScript.

In this chapter, introducing some multi-threaded patterns:

Thread Pool
Mutex
Ring Buffers
Actor Model

Thread Pool

The thread pool is a very popular pattern that is used in most multithreaded applications in some form or another.
A thread pool is a collection of homogeneous worker threads that are each capable of carrying out CPU-intensive tasks that the application may depend on.
libuv library that Node.js depends on provides a thread pool, defaulting to four threads, for performing low-level I/O operations.
This pattern might feel similar to distributed systems.
Discuss thread into two parts: pool size and dispatch strategies.

Pool Size

Typically, the size of a thread pool won’t need to dynamically change throughout the lifetime of an application.
With most operating systems there is not a direct correlation between a thread and a CPU core.
Having too many threads compared to the number of CPU cores can cause a loss of performance.
The constant context switching will actually make an application slower.
Thread pool contains: worker thread, main thread, garbage collection thread (if using libuv)

Checking Available Cores

// browser
cores = navigator.hardwareConcurrency;

// Node.js
cores = require("os").cpus().length;

Deciding the size of threads

Don’t forget the main thread, so total threads are n + 1
Deciding how many threads by purpose:
- Cryptocurrency miner that does 99.9% of the work in each thread and almost no I/O and no work in the main thread. Using the number of available cores as the size of the thread pool might be OK.
- Video streaming and transcoding service that performs heavy CPU and heavy I/O. You may want to use the number of available cores minus two.
Reasonable starting point might be to use the number of available cores minus one and then tweak when necessary.

Dispatch Strategies

A naive approach might be to just collect tasks to be done, then pass them in once the number of tasks ready to be performed meets the number of worker threads and continue once they all complete.
However, each task isn’t guaranteed to take the same amount of time to complete.

Here’s a list of the most common strategies:

Round Robin

Each task is given to the next worker in the pool, wrapping around to the beginning once the end has been hit.
The benefit of this is that each thread gets the exact same number of tasks to perform.
Unfair distribution of work.
The HAProxy reverse proxy refers to this as roundrobin.

Random

Each task is assigned to a random worker in the pool.
Possibly unfair distribution of work.

Least Busy

When a new task comes along it is given to the least busy worker.
When two workers have a tie for the least amount of work, then one can be chosen randomly.
HAProxy refers to this as leastconn.

Mutex: A Basic Lock

Mutex means mutually exclusive lock.
A mechanism for controlling access to some shared data.
It ensures that only one task may use that resource at any given time.
A task acquires the lock in order to run code that accesses the shared data, and then releases the lock once it’s done.
The code between the acquisition and the release is called the critical section.

Streaming Data with Ring Buffers

A ring buffer is an implementation of a first-in-first-out (FIFO) queue, implemented using a pair of indices into an array of data in memory.
The array is treated as if one end is connected to the other, creating a ring of data. This means that if these indices are incremented past the end of the array, they’ll go back to the beginning.
An analog in the physical world is the restaurant order wheel, commonly found in North American diners.

Basic Elements

head index: The head index refers to the next position to add data into the queue.
tail index: The tail index refers to the next position to read data out of the queue from.
buffer capacity (length): The capacity of the buffer.

How It Works

Refer to the chart from the book:

ring buffer

When the data is written into buffer, head index will move to the next position.
When the data is read from the buffer, tail index will move to the next position.
When head or tail index at the last position of buffer, next will move the the first position of buffer.
Since it’s a RING buffer, there’s no start and end point. Ths start position of head and tail index does not matter.
tail index is always located behind or at the same position with head index.
When the buffer is FULL, there’s two strategies for this situation:
- Overwrite the oldest: Overwrite the oldest data in the buffer. It means that newer data is more important.
- Prevent from writing: Throw an error, banning the new data from writing into the buffer.
It is ALWAYS necessary to get the oldest data in the buffer correctly.

FIF, LIFO, Dynamic Buffer Size

Refer to wikis:

The useful property of a circular buffer is that it does not need to have its elements shuffled around when one is consumed.
The circular buffer is well-suited as a FIFO (first in, first out) buffer.
The non-circular buffer is well suited as a LIFO (last in, first out) buffer.
The idea of stake in JavaScript meets the concept of LIFO.
Circular buffering makes a good implementation strategy for a queue that has fixed maximum size.
For arbitrarily expanding queues, a linked list approach may be preferred instead.

Actor Model

The actor model is a programming pattern for performing concurrent computation.
An actor is a primitive container that allows for executing code.
An actor is a first-class citizen in the Erlang programming language, but it can certainly be emulated using JavaScript.
An actor is capable of running logic, creating more actors, sending messages to other actors, and receiving messages.
No two actors are able to write to the same piece of shared memory, they are free to mutate their own memory.
An actor is like a function in a functional language, accepting inputs and avoiding access to global state.
Actors are single-threaded.
A system that uses actors should be resilient to delays and out-of-order delivery, especially since actors can be spread across a network.
Individual actors can also have the concept of an address. For example, tcp://127.0.0.1:1234/3 might refer to the third actor running in a program on the local computer listening on port 1234.
With the actor pattern, you shouldn’t think of the joined actors as external APIs. Instead, think of them as an extension of the program itself.

Refer to the chart from the book:

actor model

Prefix#

Thread Pool#

Pool Size#

Checking Available Cores#

Deciding the size of threads#

Dispatch Strategies#

Round Robin#

Random#

Least Busy#

Mutex: A Basic Lock#

Streaming Data with Ring Buffers#

Basic Elements#

How It Works#

FIF, LIFO, Dynamic Buffer Size#

Actor Model#

References#