How is data stored in V8 JS engine memory?
Before we dive in and open the hood of the V8 JS engine, here is a little refresher on a few key concepts.
- An object is a collection of zero or more properties. The properties have attributes that determine how the property can be used.
- A property is a container that holds other objects, primitive values, or functions.
- A primitive value is a member of one of the following built-in types:
- An object is an instance of the built-in type Object.
- A function is a callable object. A function that is associated with an object via a property is called a method.
From that, we can differentiate two data types: primitive types and objects.
How does the V8 JS Engine handle data types?
However, the specification doesn't give information on the data types, which tell the compiler or interpreter how the programmer intends to use the data. Some common data types include:
- floating-point numbers
- alphanumeric strings
For example, in C, the int type takes four bytes of memory and a char one byte of memory (on an x86 gcc compiler). It is a critical piece of information to compile efficient code.
Note: To visualise how the different types are stored in memory, I compiled a debug version of V8 on my local machine and wrote some tests (I followed the documentation here and here).
V8 internally creates hidden classes for objects at runtime, storing meta information about the object (number of properties, reference to the object's prototype, etc).
Hidden classes are based on the assumption that objects with the same structure (the same named properties in the same order) share the same hidden class. That way, objects with the same hidden class can use the same optimised generated code.
Let's look at an example:
(1) V8 creates a hidden class C0 for 'obj1' defining an empty object. We will later see what information this hidden class holds.
(2) V8 creates a hidden class C1 based on C0. C1 describes the location in memory where the property 'a' can be found. C0 is updated with a "class transition" which states that if a property "a" is added to an empty object, the hidden class should switch from C0 to C1. The hidden class of `obj1` is now C1.
(3) V8 creates a hidden class C2 based on C1 the same way as before. C1 is updated with a "class transition" which states that if a property "b" is added to an object whose hidden class is C1, then the hidden class should be switched to C2. The hidden class of `obj1` is now C2.
(4) Same as (3)
Now we have 4 hidden classes, linked as such:
If we do the same with obj2:
(5) V8 can use the hidden class C0 to define 'obj2' as en empty object.
(6) V8 creates a hidden class C4 based on C0. C4 describes the location in the memory where the property 'b' can be found. C0 is updated with a "class transition" which states that if a property "b" is added to an empty object, the hidden class should switch from C0 to C4.
The hidden class of `obj1` is now C4.
(6) and so on
We will eventually get the following hidden classes and their transitions.
`obj1` and `obj2` do not have the same hidden class because their properties are not declared in the same order. By doing this, different hidden classes are created and you are therefore precluding some of the optimisations V8 could otherwise provide.
Note: it's much better to initialise dynamic properties in the same order so that hidden classes can be reused.
The series of transitions that lead to a hidden class is called a 'Transition tree' and is stored by V8.
Now let's dig even deeper and find out how V8 actually represents objects in memory.
Core data representation types/efficiently representing values
On 32-bit architectures, the V8 engine passes around 32-bit numbers to represent all values, for improved efficiency. To be able to use the same 32 bits to represent both primitives and objects, V8 uses a technique called tagging. This technique is based on the observation that, on many architectures, allocated data must be aligned on a 4-byte boundary. Data is aligned in such a way that the least significant bit will be zero. Tagging uses this bottom bit to differentiate the two types of data:
Thanks to this technique, the same code path can handle both objects and integers.
The SMI is a 31-bit signed integer (max: 0xFFFFFFFE).
If you want to pass around a numeric value that is bigger than 31 signed bits, it doesn't fit in a SMI and V8 has to create a box: the number is turned into a double, an object is created and the double is put inside of it.
Note: Because of the computation time required to create the box and access its value, it is preferable to use 31-bits signed numbers for critical calculations. Optimisations exist in V8 to handle types other than signed integers correctly, but there are cases where this process can cause memory allocation (which degrades performance).
A HeapObject is a pointer that points to memory in the managed heap. It's a superclass for everything allocated in the heap. Because the last bit is set to 1, before using the pointer, the bit needs to be cleared.
The size of a pointer depends on many factors including the CPU architecture, compiler and Operating System. Usually the size is equal to the word size of the OS. So, for a 32-bit OS, the pointer size will be 4 bytes (even if the processor is 64-bit) whereas the pointer size will be 8 bytes for a 64-bit OS.
On 64-bit architectures, the V8 engine passes around 64-bit numbers. It's a bit different, but the tagging technique is similar:
As most of our machines are now 64-bit, we will stick to 64-bit numbers.
An object is a collection of properties: key-value pairs.
When an object 'obj' is created, V8 creates a new JS Object and allocates memory for it. The value of 'obj' is the pointer to this JS Object.
A JS Object is composed of:
- Map: a pointer to the hidden class the object belongs to.
- Properties: a pointer to an object containing named properties. Properties added after initialization of the object are added to the Properties store.
- Elements: a pointer to an object containing numbered properties.
- In-Object Properties/Fast properties: pointers to named properties defined at object initialization. The number of in-objects properties depend on the object.
From that observation, we can see that V8 will allocate a memory size of (8 + 8 + 8 + 8*N) bytes for this object.
Let's check this on Chrome by following the steps below:
- Open up DevTools on Chrome
- Run the following code on the Console
- Take a 'Heap Snapshot'
- Search for 'Person' from the Memory Tab
You should see something like this:
We can see the object 'Person' has been created and contains one In-Object Property 'firstName'. Its shallow size is 96 bytes (Shallow size vs retained size). From this and the aforementioned formula for calculating the size of an object, we can tell that V8 has allocated enough space for nine In-Object properties.
How many in-object properties does V8 reserve for an object? How much memory is thus allocated?
We don't want to reallocate objects every time a new property is added, neither do we want to allocate a big chunk of memory for tiny objects. To determine the appropriate size for objects, V8 uses something called "in-object slack tracking".
The idea is that, for a given constructor, V8 initially allocates a generous amount of memory, enough for storing its properties as in-object properties (up to a maximum that we will see later). After allocating a certain number of objects from the same constructor, V8 takes a look at the transition tree of the objects and checks the maximum size of the objects. New objects will be allocated with exactly enough memory to store the maximum number of properties.
Let's do a test on Chrome:
Run the following code on the Chrome console and take a 'Heap Snapshot'
We can see that 10 instances of 'Person' have been created and their shallow size is only 32 bytes (instead of 96 bytes like before): 8 bytes for pointer to the hidden class, 8 bytes for the pointer to the "Properties" store, 8 bytes to the pointer of the "Elements" object, 8 bytes for In-Object property 'firstName'.
But what happens if a new property is added after in-object slack tracking is complete? In that case, the new property will be added to the "Properties" store or the "Elements" store. The stores for "Properties" and "Elements" can always be reallocated with a larger size as new properties are added.
Add the following code to the previous example:
The shallow size of the object is still 32 bytes and the property 'lastName' has been added to the "Properties" store.
Every object has a hidden class of its own, which contains the memory offset for each property. When a property is created, deleted or changed dynamically, a new hidden class is created. The new hidden class keeps the information on the existing properties and the memory offset of the new property.
A hidden class knows which hidden class to refer to when a property is changed by keeping the transition information: if an object gets a new property, the transition information of the object's hidden class is checked to find the corresponding hidden class or to create a new one if the transition information doesn't contain the condition identical to the property change.
If we look back at our example above on hidden classes:
- doesn't contain property offset values as it refers to an empty object
- contains the transition information that if the property 'a' is added to the object, the hidden class should be changed to C1
- contains memory offset value of the 'a' property
- contains the transition information that if the property 'b' is added to the object, the hidden class should be changed to C2
- contains memory offset value of the 'b' property
A hidden class is a Map object. V8 engine allocates a size of 80 bytes for each Map object.
A Map is a key data structure in v8, containing information such as:
- the dynamic type of the object
- the size of the object in bytes
- the properties of the object and where they are stored
- the type of the array elements, e.g. unboxed doubles or tagged pointers
- the prototype of the object, if any
A Map is implemented as a Hashmap in V8. All heap objects have a Map that describes their structure.
A hidden class is basically a table of descriptors, with one entry for each property. It contains other information as well, like the size of the object and pointers to constructors and prototypes. The transition information is stored in a special descriptor.
In our previous example, we would have:
- object size: <size>
- prototype: <prototype>
- "a": TRANSITION to C1 at offset A
- object size: <size>
- prototype: <prototype>
- "a": FIELD at offset A
- "b": TRANSITION to C2 at offset B
- object size: <size>
- prototype: <prototype>
- "a": FIELD at offset A
- "b": FIELD at offset B
- "c": TRANSITION to C3 at offset C
In a nutshell, a hidden class is composed of:
- object size: size of the object
- prototype: a pointer to the object's prototype
- descriptors: a table describing the properties, with one entry for each property.
From previous code (adding a property 'lastName' to people), we should have the following transition tree:
DevTools help us see that by displaying the transition descriptor separately, as well as another element called back_pointer that points to the previous hidden class in the transition tree.
Indeed, we can see in DevTools that the hidden class of instance 0 of constructor Person() has a back_pointer that points to the hidden class of instance 1 of constructor Person(), and the hidden class of instance 1 of constructor Person() has a transition that points to the hidden class of instance 0 of constructor Person().
The way the object is stored in memory can be different from what is shown by DevTools. For example, if we follow the explanations given by Google, the hidden classes C0 should have only one property in its descriptors object (firstName).
We can see it better when using the V8 engine I compiled on my machine. Let's take the following example:
In V8, let's display the objects people, people and people. We can notice the different properties and the transition tree from people.
However, depending on the key of the property we can differentiate two types of properties:
- numbered (or indexed) properties
- named properties
Elements: numbered properties
If the property key is a non-negative integer (0, 1, 2, etc), the property will be stored in the "Elements" object. These properties are called elements.
V8 stores them separately from non-numeric properties for optimisation purposes.
V8 keeps track of what kind of elements are contained in the array to be able to optimise any operations specifically for this type of element. Let's take the following array:
When adding a floating-point number to the same array, V8 changes its elements kind to PACKED_DOUBLE_ELEMENTS.
When adding a string literal to the same array, V8 changes again its elements kind to PACKED_ELEMENTS.
So far, we have three distinct kinds of elements:
- Small integers (SMI)
- Doubles, for floating-point numbers and integers that cannot be represented as a SMI
- Regular elements, for values that cannot be represented as SMI or Doubles
Note: the conversion of elements can only go in one direction: from specific (ex: PACKED_SMI_ELEMENTS) to more general (ex: PACKED_ELEMENTS). The inverse is not possible!
We can differentiate between packed (or dense) arrays and sparse arrays (with holes in them). When you create holes in an array, the elements kind is converted to the 'HOLEY' variant.
Note: the elements kind conversion can go from PACKED to its HOLEY counterpart. The inverse is not possible!
V8 currently distinguishes between 21 different elements kinds.
Note: Many performance tips are given on arrays here! It's worth a read.
Note: Adding array-indexed properties does not create new HiddenClasses (while adding named properties does).
Fast or slow elements
We can distinguish two representations of the Elements store: contiguous (fast) and dictionary-based (slow).
In fast representation, the Elements store is an array of values arranged contiguously in memory where the property index maps to the offset of the item in the array. That means that even empty slots in the array occupy space in memory.
This simple representation is wasteful for very large sparse (holey) arrays where only a few entries are occupied. In that case, V8 uses a dictionary-based representation to save memory at the cost of slightly slower access.
const sparseArray = ;
sparseArray = 'foo';
Allocating a full array with 10k entries would be wasteful here. Instead, V8 creates a dictionary where key-value-descriptor triplets are stored. The key in this case would be
'9999' and the value
'foo' and the default descriptor is used.
Note: Array functions perform considerably slower on objects with slow elements!
If the property key is not a non-negative integer, the property will be stored as an Inline-Object Property or in the "Properties" object.
In-Object Properties are, as seen before, directly stored in the JS Object structure. They are super-fast properties; they are the fastest properties available in V8 as they are accessible without any indirection. The number of in-object properties is predetermined by the initial size of the object. If more properties get added than there is space in the object, they are stored in the Properties store.
The Properties store is an object that can be either a Fixed Array or a Dictionary.
When the number of properties is low, the Properties store is defined as an Array by V8.
The properties are simply accessed by index in the properties store. To get from the name of the property to the actual position in the properties store, we have to consult the descriptor array on the hidden class.
These are called "Fast properties."
However, if many properties get added and deleted from an object, it can result in significant time and memory overhead to maintain the descriptor array and hidden classes.
Hence, V8 also supports so-called slow properties. An object with slow properties has a self-contained dictionary as a Properties store. All the properties meta information is no longer stored in the descriptors table in the hidden class but directly in the properties dictionary. Hence, properties can be added and removed without updating the hidden class.
Since inline caches don’t work with dictionary properties, the latter are typically slower than fast properties.
Now that we have seen how objects are stored in the V8 memory, let's talk about how primitives are stored.
As seen before, we can distinguish two types of numbers: the ones that can be represented by an SMI, and the others.
Let's take a variable 'a' and assign the number 1 to it (a=1), and see what is displayed in the V8 environment:
We can notice the variable 'a' is directly stored in the memory as a SMI.
Let's take now a variable 'b' and assign a floating number 1.2 to it (b=1.2):
We can notice now that the variable 'b' is a pointer that points to a Map with the type *_NUMBER_TYPE.
A string variable points to a Map with the type *_STRING_TYPE.
A boolean variable points to a Map with the type ODDBALL_TYPE.
A symbol variable points to a Symbol structure.
An undefined variable points to a Map with type ODDBALL_TYPE.
A null variable points to a map with type ODDBALL_TYPE.
I won't go further into the data structures of primitives. You can find more information online, such as how to work with primitives like with objects (with methods) here.
Now that we have seen how data is represented in the V8 engine memory, let's see where data is stored.
Where is data stored ?
Whatever the language, the memory life cycle is:
- allocate memory needed
- use the allocated memory
- free the allocated memory
In C for example, the memory management is done by the developer with system calls such as free() or malloc() to allocate memory dynamically.
Memory spaces in V8
Check out more blogs from our engineering team here.