Skip to content

JSVM internals

Alicia Boya García edited this page Dec 28, 2020 · 3 revisions

Pending tasks to do:

  • Explain finalization of JSObjects.
  • Explain subtleties of canonicalization and concurrent GC (a native object may have briefly two or more handles assigned, even if only one is actually used).
  • Custom allocator backed by the JVM heap. (Check if still needed)
  • Java -> JS mapping.

Data types

Every datum going inside or outside of the JS VM is wrapped in JSValue, a variant type the user can coerce into several types.

Primitive types, such as numbers, strings, booleans, null and undefined are stored by value in JSValue.value. Primitive types are always copied when they pass from JS to Java or from Java to JS.

Objects and functions work differently. Both of them are instances of JSObject, which is actually a mapping class that holds a reference to a JS object. JSObject has, for instance, the get() and set() methods that allow to manipulate the properties of the referenced JS object.

Since JSObjects are references, their properties are not copied when the object is passed to Java. Changes in the value of the properties in JS will be reflected in successive calls to JSObject methods.

JSObject instances inhibit garbage collection of their linked JS objects while they are alive in Java. Finalizers are used in order to release the references in JavaScript after their Java-mapped counterparts are destroyed in a GC cycle.

Threading

JS as a language is designed to be single threaded, and it's no surprise that almost all JS engines are single threaded (with the honorable exception of Rhino, that since it compiles to JVM bytecode, it's actually capable of running the same JS program instance in several JVM threads simultaneously!).

Duktape is no exception to the rule. Furthermore, since it's a stack based virtual machine, attempting to run two JS function on the same program simultaneously would fail very badly.

On the other hand, Duktape is reentrant, which means you can have several instances of Duktape VMs, each one running a different program, or copies of the same program, and each of these can run in parallel without any problem, as they do not share memory.

The threading model of JSVM capitalizes on this feature, and it's actually very simple: Each JSVM instance holds a Duktape VM. At any point in time methods in a JSVM instance may be called by many threads, but only one at most will be doing work, while the other wait for the lock. Several threads can do work on different JSVM instances simultaneously.

JSVM has a lock property, which is used as a synchronization object all across the API. The lock is always adquired from Java, using the synchronized() keyword, no synchronization is performed in C++. Every public method must acquire the lock before performing any work. All JNI code can asume that no other threads are using that JSVM instance at the same time.

JSObject instances must also acquire the lock of their associated JSVM.

JSVMPriv

All C++ implementation details of a JSVM instance are stored together in a structure named JSVMPriv. These include the Duktape context (duk_context*), and tables of handles.

JSVMPriv is constructed when JSVM is initialized and lives as long as the JSVM instance is not finalized. A pointer to JSVMPriv is stored in a private Java field, JSVM.hPriv.

Prior to running any JS code, a JSVMCallContext object must be brought into scope. This allows access to the JSVMPriv* given a JSVM, while also storing the JNIEnv* and JSVM in the stack. Functions deeper in the stack (e.g. a C++ callback invoked from running JS code) can get access to these by calling JSVMCallContext::current().

It's important to remember that both JNIEnv* and all JNI local references (including those to the JSVM object) are valid only during the JNI execution context they are obtained and should not be stored any further. In order to keep references for longer, JNI global references must be used.

Mapping of objects

JSVM uses numeric handles internally to manage JS objects from Java and vice versa. This approach is necessary, since Duktape does not expose handles of its own or object pointers through its API, but it's also an approach that's safe and relatively easy to understand.

Let's see how this approach can be implemented in general before explaining how it is implemented in the case of JSVM.

Simple object mapping

Suppose you have two different programming environments. Maybe each environment hosts a different programming language, as is the case in JSVM, or maybe they live in different processes, or both. No matter the case, if we want to use objects of one environment from the other, we can use this technique.

There are two parties involved:

  • The native environment is the one were the objects live. All their data is stored in its memory and it contains all the methods required to perform operations on the object.
  • The client environment wants to use objects that pertain to the native environment. In order to do so, it needs a handle, a numeric identifier the native environment assigns to each shared object. The client environment wraps this handle in a stub object. Stub object methods allow performing operations on the native object by means of communication with the native environment.

Please note these roles only apply to one-side sharing: The native environment by definition shares an object with a client environment by means of a handle. When you use JS objects from Java, Duktape is the native environment and Java is the client environment. On the other hand, when you use Java objects from JS, Java is the native environment and Duktape is the client environment. Both sides manage handles independently of each other.

It's important to consider the relations between these entities in this technique:

  • A StubObject has exactly one handle, usually stored as a private attribute.
  • No handle is shared by any two StubObject.
  • Each living handle is associated to a NativeObject. This is done by means of a table, objectsByHandle. To get the best performance, searches in that table should be fast. If handles are small, incremental, reusable numbers, as is the case in JSVM, this table can be stored as a plain array of references to NativeObject, indexed by handle. This table also inhibits destruction of shared objects in environments with a GC.
  • In this approach, the same NativeObject may be associated with several handles, which in turn are wrapped in several StubObject.

Garbage collection in simple object mapping

Handles must inhibit destruction of their linked native objects, since otherwise an error could be produced later on when the client environment requested to use them.

In this approach, destruction inhibition works thanks to the objectsByHandle table, since no GC would free an object with strong living references somewhere.

On the other hand, when the client environment stops needing a StubObject, its handle must be released so that the native environment can remove the reference to the native object and its memory can be reclaimed by the GC if it has no more references.

In order to do release handles automatically it's necessary that the client language supports destructors or finalizers. Note that in languages with GC systems based on mark-and-sweep finalizers will not be called immediately when the last reference is lost, but much later when a GC cycle is scheduled.

Java supports finalizers as part of the language. JavaScript on the other hand has no API for finalizers, but many JS engines, including Duktape, support it when embedded in an application.

Object mapping with full canonicalization

Simple object mapping, as explained in the section before, can be modified to implement canonicalization. That is, the benefit that if several native function calls return references to the same NativeObject the client side will receive the same handle, which will be in turn represented by the same StubObject.

Canonicalization reduces the number of living handles and in some cases, when the very same few objects are always returned, can bound the size of objectsByHandle, improving memory usage and reducing GC frequency.

On the other hand, canonicalization requires a bit more memory in cases where very few times already shared objects are returned and introduces significant complexity. Canonicalization also needs the client environment to support weak references, which are not universal. Java supports weak references, but JavaScript currently doesn't. There is currently a Stage 1 proposal to add WeakRef to JavaScript but so far has not been implemented in Duktape.

Note: WeakMap and WeakSet from JS API are not substitute for WeakRef as they store keys weakly, but they provide no interface to query the currently existing keys in any way. They actually serve a very different purpose.

When modified to support canonicalization, the mapping associations look like this:

  • StubObject.handle on the client environment and the objectsByHandle table on the native environment keep existing as before.
  • NativeObject keeps track of its current handle. In JSVM, this is implemented for Duktape by using a hidden symbol property, which can be set using the C API even if the object is frozen and sealed (see duk_def_prop() and DUK_DEFPROP_FORCE). In other programming environments this kind of extension can be made using a WeakMap, for instance.
  • On the client environment, a new table appears, stubsByHandle, that has the purpose of mapping the same handles to the same StubObject instances. As in the case of objectsByHandle, this can be implemented as an array indexed by handles.

There is a very important detail in stubsByHandle: the values of this array must be weak references, since otherwise we would have a complete cycle: StubObject instances would always remain referenced from stubsByHandle for the duration of the program, never finalizing, never releasing the handles and therefore never freeing the memory used by their linked NativeObject instances.