Prefix
Here’s a little side note from Chapter 6 - Multithreaded Patterns in this book: Multithreaded JavaScript.
In this chapter, introducing some multi-threaded patterns:
- Thread Pool
- Mutex
- Ring Buffers
- Actor Model
Thread Pool
- The thread pool is a very popular pattern that is used in most multithreaded applications in some form or another.
- A thread pool is a collection of homogeneous worker threads that are each capable of carrying out CPU-intensive tasks that the application may depend on.
libuv
library that Node.js depends on provides a thread pool, defaulting to four threads, for performing low-level I/O operations.- This pattern might feel similar to distributed systems.
- Discuss thread into two parts: pool size and dispatch strategies.
Pool Size
- Typically, the size of a thread pool won’t need to dynamically change throughout the lifetime of an application.
- With most operating systems there is not a direct correlation between a thread and a CPU core.
- Having too many threads compared to the number of CPU cores can cause a loss of performance.
- The constant context switching will actually make an application slower.
- Thread pool contains: worker thread, main thread, garbage collection thread (if using
libuv
)
Checking Available Cores
// browser
cores = navigator.hardwareConcurrency;
// Node.js
cores = require("os").cpus().length;
Deciding the size of threads
- Don’t forget the main thread, so total threads are
n + 1
- Deciding how many threads by purpose:
- Cryptocurrency miner that does 99.9% of the work in each thread and almost no I/O and no work in the main thread. Using the number of available cores as the size of the thread pool might be OK.
- Video streaming and transcoding service that performs heavy CPU and heavy I/O. You may want to use the number of available cores minus two.
- Reasonable starting point might be to use the number of available cores minus one and then tweak when necessary.
Dispatch Strategies
- A naive approach might be to just collect tasks to be done, then pass them in once the number of tasks ready to be performed meets the number of worker threads and continue once they all complete.
- However, each task isn’t guaranteed to take the same amount of time to complete.
Here’s a list of the most common strategies:
Round Robin
- Each task is given to the next worker in the pool, wrapping around to the beginning once the end has been hit.
- The benefit of this is that each thread gets the exact same number of tasks to perform.
- Unfair distribution of work.
- The HAProxy reverse proxy refers to this as
roundrobin
.
Random
- Each task is assigned to a random worker in the pool.
- Possibly unfair distribution of work.
Least Busy
- When a new task comes along it is given to the least busy worker.
- When two workers have a tie for the least amount of work, then one can be chosen randomly.
- HAProxy refers to this as
leastconn
.
Mutex: A Basic Lock
- Mutex means mutually exclusive lock.
- A mechanism for controlling access to some shared data.
- It ensures that only one task may use that resource at any given time.
- A task acquires the lock in order to run code that accesses the shared data, and then releases the lock once it’s done.
- The code between the acquisition and the release is called the critical section.
Streaming Data with Ring Buffers
- A ring buffer is an implementation of a first-in-first-out (FIFO) queue, implemented using a pair of indices into an array of data in memory.
- The array is treated as if one end is connected to the other, creating a ring of data. This means that if these indices are incremented past the end of the array, they’ll go back to the beginning.
- An analog in the physical world is the restaurant order wheel, commonly found in North American diners.
Basic Elements
head
index: Thehead
index refers to the next position to add data into the queue.tail
index: Thetail
index refers to the next position to read data out of the queue from.- buffer capacity (length): The capacity of the buffer.
How It Works
Refer to the chart from the book:
- When the data is written into buffer,
head
index will move to the next position. - When the data is read from the buffer,
tail
index will move to the next position. - When head or
tail
index at the last position of buffer, next will move the the first position of buffer. - Since it’s a RING buffer, there’s no start and end point. Ths start position of
head
andtail
index does not matter. tail
index is always located behind or at the same position withhead
index.- When the buffer is FULL, there’s two strategies for this situation:
- Overwrite the oldest: Overwrite the oldest data in the buffer. It means that newer data is more important.
- Prevent from writing: Throw an error, banning the new data from writing into the buffer.
- It is ALWAYS necessary to get the oldest data in the buffer correctly.
FIF, LIFO, Dynamic Buffer Size
Refer to wikis:
- The useful property of a circular buffer is that it does not need to have its elements shuffled around when one is consumed.
- The circular buffer is well-suited as a FIFO (first in, first out) buffer.
- The non-circular buffer is well suited as a LIFO (last in, first out) buffer.
- The idea of stake in JavaScript meets the concept of LIFO.
- Circular buffering makes a good implementation strategy for a queue that has fixed maximum size.
- For arbitrarily expanding queues, a linked list approach may be preferred instead.
Actor Model
- The actor model is a programming pattern for performing concurrent computation.
- An actor is a primitive container that allows for executing code.
- An actor is a first-class citizen in the Erlang programming language, but it can certainly be emulated using JavaScript.
- An actor is capable of running logic, creating more actors, sending messages to other actors, and receiving messages.
- No two actors are able to write to the same piece of shared memory, they are free to mutate their own memory.
- An actor is like a function in a functional language, accepting inputs and avoiding access to global state.
- Actors are single-threaded.
- A system that uses actors should be resilient to delays and out-of-order delivery, especially since actors can be spread across a network.
- Individual actors can also have the concept of an address. For example,
tcp://127.0.0.1:1234/3
might refer to the third actor running in a program on the local computer listening on port 1234. - With the actor pattern, you shouldn’t think of the joined actors as external APIs. Instead, think of them as an extension of the program itself.
Refer to the chart from the book: