Garbage collection is a process of automatically releasing memory when it is no longer in use.
In lower-level languages such as C, it is often necessary to allocate and deallocate memory manually. When creating a dynamically resizable array, for example. JavaScript handles tasks like these in the background. Therefore, it needs to have a way to free up memory automatically when it’s no longer needed.
Freeing up memory on the stack (where most primitives are stored) is a simple process. But heap memory (which stores objects) needs a more sophisticated process. Garbage collection is the mechanism that releases memory from the heap when it is no longer in use.
Two common strategies for garbage collection are reference counting and mark and sweep.
Reference counting
Objects have references that point to them. For example, let bob = new Person;
will create a Person
object and assign a reference to it to the bob
variable. There could be — and often are — other variables that reference this Person
object as well.
In a reference counting scheme, each object has a reference counter, which keeps a count of active references to itself. When a reference gets created, the object increments its reference counter. When a reference’s lifetime ends, the object decrements its reference counter.
If bob
is inside a function, its lifetime will end when the function is finished executing. When this happens, the Person
object will decrement its reference count by one. When there are no more references to it, its reference count will be zero.
A garbage collector will periodically look through all of the objects in an application and check their reference counters. If an object’s counter is zero, the garbage collector will destroy it, freeing the memory allocated for it so that the memory is available for reuse.
Let’s look at an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
class Person { constructor(name) { this.name = name; } } function makePeople() { let bob = new Person('Bob'); let john = new Person('John'); let david = new Person('David'); return [bob, john, david]; } function listPeople() { let people = makePeople(); for (person of people) { console.log(person.name); } } listPeople(); |
In this example, when makePeople
is called, lines 9, 10 and 11 each allocate memory for a Person
object and assign a reference to the object to a variable. The variables go out of scope, but the references are returned to the caller, listPeople
. Once listPeople
completes, the people
array variable goes out of scope, and there are no more references to the Person
object. There is also no longer a reference to the Array
object assigned to people
. Even if the program continues beyond the call to listPeople
on line 22, the garbage collector will still release the memory for these objects, since there are no longer any references to any of them.
Reference counting and circular references
The issue with reference counting is that it fails to deallocate circular references. Suppose I do this:
1 2 3 4 5 6 7 8 9 10 11 12 |
function test() { const x = {}; const y = {}; x.a = y; y.a = x; return; } test(); |
Although x
and y
fall out of scope when the test
function finishes executing, each one has a reference to the other. So, the reference counters never make it to zero. If we relied entirely on reference counters, the memory would never become eligible for garbage collection, and we would have a memory leak.
Fortunately, we don’t rely entirely on reference counters.
Mark and Sweep
The mark and sweep algorithm is designed to deal with circular references. It starts with a set of root objects, objects that are known to be accessible. Typically, these are objects that are directly available from the call stack (i.e. local variables and function parameters of currently invoked functions) and global variables.
In the mark stage, the garbage collector traverses the entire set of root objects, marking as in use all of the objects that each root object points to. It marks all objects that those objects point as well, and so on, until it marks every object that the root set can reach.
Once all of the memory is traversed and in-use objects are marked, the sweep stage frees the memory of all unmarked objects.
Suppose the test
function above were executing when a mark-and-sweep operation commenced. The call stack would have a reference to test
. Since test
can access x
and y
, they would be marked as in use and not garbage collected. But once the test
function finished executing, it would drop from the call stack and the references to x
and y
would go out of scope. Then, when the next mark-and-sweep operation commenced, the mark operation would not mark the two variables as in use and the sweep operation would deallocate their memory.
Improvements to mark and sweep
Although this is the basic process, it is a naive implementation of the mark-and-sweep algorithm. If it were done this way in practice, performance would be very poor, because all interaction with memory would have to be suspended while the operation was going on. Because of this, most garbage collectors (including JavaScript) implement some version of tri-color marking. This is a version of marking and sweeping that does not require freezing of all of the memory at once.
For more on tri-color marking, and other related topics, see this Wikipedia Article.