Today we continue looking at the C++ that IL2CPP generates for our C# code by calling various types of functions and using boxing and unboxing. Just how much performance overhead do these entail? Read on to find out!

Non-Virtual Methods

First let’s get a baseline by calling a non-virtual, non-interface method:

interface MyInterface
{
	void InterfaceMethod();
}
 
class MyClass : MyInterface
{
	public void NonVirtualMethod() {}
	public virtual void VirtualMethod() {}
	public void InterfaceMethod() {}
}
 
static class TestClass
{
	static void CallNonVirtualFunction(MyClass x)
	{
		x.NonVirtualMethod();
	}
}

And let’s see what C++ IL2CPP generates for this:

extern "C"  void TestClass_CallNonVirtualFunction_m2164300416 (RuntimeObject * __this /* static, unused */, MyClass_t3388352440 * ___x0, const RuntimeMethod* method)
{
	{
		MyClass_t3388352440 * L_0 = ___x0;
		NullCheck(L_0);
		MyClass_NonVirtualMethod_m3363069416(L_0, /*hidden argument*/NULL);
		return;
	}
}

There’s not much going on here. There’s a null check as overhead, but this can be removed. Then the method gets called, so let’s look at that.

extern "C"  void MyClass_NonVirtualMethod_m3363069416 (MyClass_t3388352440 * __this, const RuntimeMethod* method)
{
	{
		return;
	}
}

The method itself is, as expected, empty. Other than the usual null check, there’s no overhead at all.

Virtual Methods

Now let’s call a virtual method:

static class TestClass
{
	static void CallVirtualFunction(MyClass x)
	{
		x.VirtualMethod();
	}
}

Here’s the generated C++:

extern "C"  void TestClass_CallVirtualFunction_m3601475380 (RuntimeObject * __this /* static, unused */, MyClass_t3388352440 * ___x0, const RuntimeMethod* method)
{
	{
		MyClass_t3388352440 * L_0 = ___x0;
		NullCheck(L_0);
		VirtActionInvoker0::Invoke(5 /* System.Void MyClass::VirtualMethod() */, L_0);
		return;
	}
}

Instead of calling the method directly or via C++ classes’ support for virtual functions, VirtActionInvoker0::Invoke is invoked. So let’s look at that:

struct VirtActionInvoker0
{
	typedef void (*Action)(void*, const RuntimeMethod*);
 
	static inline void Invoke (Il2CppMethodSlot slot, RuntimeObject* obj)
	{
		const VirtualInvokeData& invokeData = il2cpp_codegen_get_virtual_invoke_data(slot, obj);
		((Action)invokeData.methodPtr)(obj, invokeData.method);
	}
};

This in turn calls il2cpp_codegen_get_virtual_invoke_data:

FORCE_INLINE const VirtualInvokeData& il2cpp_codegen_get_virtual_invoke_data(Il2CppMethodSlot slot, const RuntimeObject* obj)
{
    Assert(slot != 65535 && "il2cpp_codegen_get_virtual_invoke_data got called on a non-virtual method");
    return obj->klass->vtable[slot];
}

The compiler will remove the Assert, so this just has one line. The first part accesses the klass field which is at the very start of all object types:

struct Il2CppObject
{
    Il2CppClass *klass;
    MonitorData *monitor;
};

Then the vtable field, situated at the end of the very large Il2CppClass, is indexed into. Back in Invoke, the methodPtr and method fields of VirtualInvokeData are read. It turns out they’re right next to each other:

struct VirtualInvokeData
{
    Il2CppMethodPointer methodPtr;
#if RUNTIME_MONO
    const MonoMethod* method;
#else
    const MethodInfo* method;
#endif
};

Then there’s the indirect (“virtual”) function call through the function pointer to the method itself, which is also empty:

extern "C"  void MyClass_VirtualMethod_m2874074479 (MyClass_t3388352440 * __this, const RuntimeMethod* method)
{
	{
		return;
	}
}

Beyond the expected indirect function call, IL2CPP adds on overhead mainly for this expression: obj->klass->vtable[slot]. While the object itself is likely in CPU cache since the function is operating on it, its klass->vtable[slot] may not be unless the virtual method has recently been called. If the CPU cache is missed then obj->klass->vtable[slot] will need to be read from RAM at a cost of about 100 nanoseconds which is about equivalent to calculating 14 square roots on a 2 GHz ARM CPU. In addition to the already-expensive indirect function call, a data cache miss can be even more expensive.

Interface Methods

Now let’s try calling an interface method:

static class TestClass
{
	static void CallInterfaceFunction(MyInterface x)
	{
		x.InterfaceMethod();
	}
}

Here’s the C++ that IL2CPP generates:

extern "C"  void TestClass_CallInterfaceFunction_m952904746 (RuntimeObject * __this /* static, unused */, RuntimeObject* ___x0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_CallInterfaceFunction_m952904746_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		RuntimeObject* L_0 = ___x0;
		NullCheck(L_0);
		InterfaceActionInvoker0::Invoke(0 /* System.Void MyInterface::InterfaceMethod() */, MyInterface_t138017515_il2cpp_TypeInfo_var, L_0);
		return;
	}
}

Just calling an interface function is apparently enough to include method initialization overhead. As we’ve seen before, this also may result in a CPU data cache miss.

Next, there’s a call to InterfaceActionInvoker0::Invoke, so let’s check it out:

struct InterfaceActionInvoker0
{
	typedef void (*Action)(void*, const RuntimeMethod*);
 
	static inline void Invoke (Il2CppMethodSlot slot, RuntimeClass* declaringInterface, RuntimeObject* obj)
	{
		const VirtualInvokeData& invokeData = il2cpp_codegen_get_interface_invoke_data(slot, obj, declaringInterface);
		((Action)invokeData.methodPtr)(obj, invokeData.method);
	}
};

The second line looks just like the line we saw when calling a virtual method, but the first line calls a different function: il2cpp_codegen_get_interface_invoke_data. Let’s take a look at that:

FORCE_INLINE const VirtualInvokeData& il2cpp_codegen_get_interface_invoke_data(Il2CppMethodSlot slot, const RuntimeObject* obj, const RuntimeClass* declaringInterface)
{
    Assert(slot != 65535 && "il2cpp_codegen_get_interface_invoke_data got called on a non-virtual method");
    return il2cpp::vm::Class::GetInterfaceInvokeDataFromVTable(obj, declaringInterface, slot);
}

This function has the same asset as we saw with virtual methods, but the line that matters now calls a function:

static FORCE_INLINE const VirtualInvokeData& GetInterfaceInvokeDataFromVTable(const Il2CppObject* obj, const Il2CppClass* itf, Il2CppMethodSlot slot)
{
	const Il2CppClass* klass = obj->klass;
	IL2CPP_ASSERT(klass->initialized);
	IL2CPP_ASSERT(slot < itf->method_count);
 
	for (uint16_t i = 0; i < klass->interface_offsets_count; i++)
	{
		if (klass->interfaceOffsets[i].interfaceType == itf)
		{
			int32_t offset = klass->interfaceOffsets[i].offset;
			IL2CPP_ASSERT(offset != -1);
			IL2CPP_ASSERT(offset + slot < klass->vtable_count);
			return klass->vtable[offset + slot];
		}
	}
 
	return GetInterfaceInvokeDataFromVTableSlowPath(obj, itf, slot);
}

There’s a lot going on here, so let’s take it apart one bit at a time. The first like reads klass just like with virtual functions. The IL2CPP_ASSERT lines should get stripped out by the compiler. Then there’s a loop that reads klass->interface_offsets_count and klass->interfaceOffsets[i].interfaceType each iteration. It’s likely that klass->interface_offsets_count is the number of methods in the interface, so this may grow more expensive as the size of the interface grows and for methods toward the end of the interface. When the method is found, the klass->vtable is finally accessed just like with virtual functions.

If the method isn’t found, GetInterfaceInvokeDataFromVTableSlowPath gets called which is apparently even slower:

// we don't want this method to get inlined because that makes GetInterfaceInvokeDataFromVTable method itself very large and performance suffers
static IL2CPP_NO_INLINE const VirtualInvokeData& GetInterfaceInvokeDataFromVTableSlowPath(const Il2CppObject* obj, const Il2CppClass* itf, Il2CppMethodSlot slot);

This function is inside libil2cpp.a, a binary library, so we can’t see the source code for what it does. Update: the source code for libil2cpp.a is available in the Unity installation directory. Here’s this function:

const VirtualInvokeData& Class::GetInterfaceInvokeDataFromVTableSlowPath(const Il2CppObject* obj, const Il2CppClass* itf, Il2CppMethodSlot slot)
{
    const Il2CppClass* klass = obj->klass;
 
#if NET_4_0
    if (itf->generic_class != NULL)
    {
        const Il2CppTypeDefinition* genericInterface = MetadataCache::GetTypeDefinitionFromIndex(itf->generic_class->typeDefinitionIndex);
        const Il2CppGenericContainer* genericContainer = MetadataCache::GetGenericContainerFromIndex(genericInterface->genericContainerIndex);
 
        for (uint16_t i = 0; i < klass->interface_offsets_count; ++i)
        {
            const Il2CppRuntimeInterfaceOffsetPair* pair = klass->interfaceOffsets + i;
            if (IsGenericClassAssignableFrom(itf, pair->interfaceType, genericContainer))
            {
                IL2CPP_ASSERT(pair->offset + slot < klass->vtable_count);
                return klass->vtable[pair->offset + slot];
            }
        }
    }
#endif
 
    if (klass->is_import_or_windows_runtime)
    {
        Il2CppIUnknown* iunknown = static_cast<const Il2CppComObject*>(obj)->identity;
 
        // It might be null if it's called on a dead (already released) or fake object
        if (iunknown != NULL)
        {
            if (itf->vtable_count > 0)
            {
                IL2CPP_ASSERT(slot < itf->vtable_count);
 
                // Nothing will be referencing these types directly, so we need to initialize them here
                const VirtualInvokeData& invokeData = itf->vtable[slot];
                Init(invokeData.method->declaring_type);
                return invokeData;
            }
 
            // TO DO: add support for covariance/contravariance for projected interfaces like
            // System.Collections.Generic.IEnumerable`1<T>
        }
    }
 
    RaiseExceptionForNotFoundInterface(klass, itf, slot);
    IL2CPP_UNREACHABLE;
}

Finally, let’s look at the actual method to confirm that it’s empty:

extern "C"  void MyClass_InterfaceMethod_m2210917719 (MyClass_t3388352440 * __this, const RuntimeMethod* method)
{
	{
		return;
	}
}

So calling an interface method is quite a bit more expensive than calling a virtual method. It involves method initialization, a loop of some number of iterations, and possibly an unknown “slow path.”

Delegates

Now let’s try calling a delegate using the Action type:

static class TestClass
{
	static void CallDelegate(Action x)
	{
		x();
	}
}

Here’s the C++ output from IL2CPP:

extern "C"  void TestClass_CallDelegate_m1262409583 (RuntimeObject * __this /* static, unused */, Action_t1264377477 * ___x0, const RuntimeMethod* method)
{
	{
		Action_t1264377477 * L_0 = ___x0;
		NullCheck(L_0);
		Action_Invoke_m937035532(L_0, /*hidden argument*/NULL);
		return;
	}
}

There’s no method initialization overhead like when we called an interface method, but let’s see what’s in Action_Invoke_m937035532:

extern "C"  void Action_Invoke_m937035532 (Action_t1264377477 * __this, const RuntimeMethod* method)
{
	if(__this->get_prev_9() != NULL)
	{
		Action_Invoke_m937035532((Action_t1264377477 *)__this->get_prev_9(), method);
	}
	Il2CppMethodPointer targetMethodPointer = __this->get_method_ptr_0();
	RuntimeMethod* targetMethod = (RuntimeMethod*)(__this->get_method_3());
	RuntimeObject* targetThis = __this->get_m_target_2();
	il2cpp_codegen_raise_execution_engine_exception_if_method_is_not_found(targetMethod);
	bool ___methodIsStatic = MethodIsStatic(targetMethod);
	if (___methodIsStatic)
	{
		if (il2cpp_codegen_method_parameter_count(targetMethod) == 0)
		{
			// open
			typedef void (*FunctionPointerType) (RuntimeObject *, const RuntimeMethod*);
			((FunctionPointerType)targetMethodPointer)(NULL, targetMethod);
		}
		else
		{
			// closed
			typedef void (*FunctionPointerType) (RuntimeObject *, void*, const RuntimeMethod*);
			((FunctionPointerType)targetMethodPointer)(NULL, targetThis, targetMethod);
		}
	}
	else
	{
		{
			// closed
			typedef void (*FunctionPointerType) (void*, const RuntimeMethod*);
			((FunctionPointerType)targetMethodPointer)(targetThis, targetMethod);
		}
	}
}

There’s a ton going on here! It’s actually put much to put into an article, so we’ll just summarize the performance costs as three CPU data cache misses plus a handful of branches (if) and function calls. Suffice to say that invoking the delegate is the most expensive way to call a function, but not dramatically more expensive than calling an interface method.

Boxing and Unboxing

Let’s conclude today by looking at boxing and unboxing. Boxing is what happens when a value type (e.g. int) needs to be converted into a reference type (e.g. object) and unboxing is the reverse. For more details, see the first part of this article. Now let’s look at the C# test code:

static class TestClass
{
	static object Boxing(int x)
	{
		return x;
	}
 
	static int Unboxing(object x)
	{
		return (int)x;
	}
}

Here’s the C++ that gets generated by IL2CPP for boxing:

extern "C"  RuntimeObject * TestClass_Boxing_m289043937 (RuntimeObject * __this /* static, unused */, int32_t ___x0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_Boxing_m289043937_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		int32_t L_0 = ___x0;
		int32_t L_1 = L_0;
		RuntimeObject * L_2 = Box(Int32_t2950945753_il2cpp_TypeInfo_var, &L_1);
		return L_2;
	}
}

Boxing adds method initialization overhead, typically costing a cache miss just like with virtual method calls. Then it calls the global Box function:

inline RuntimeObject* Box(RuntimeClass* type, void* data)
{
    return il2cpp::vm::Object::Box(type, data);
}

Unfortunately, the source code for Object::Box is also compiled into libil2cpp.a and therefore unavailable for us to inspect. We know that it involves a GC allocation for a class that implements all the interfaces for int (e.g. IComparable), but not much more. Since the code we can see is passing a RuntimeClass for int then we can assume it’s being used to determine which type of class to box to. It’s safe to say that this probably involves quite a bit of work such as CPU cache misses, branches, and everything involved in GC allocation. Update: the source code for libil2cpp.a is available in the Unity installation directory. Here’s this function:

Il2CppObject* Object::Box(Il2CppClass *typeInfo, void* val)
{
    Class::Init(typeInfo);
    if (!typeInfo->valuetype)
        return *(Il2CppObject**)val;
 
    if (Class::IsNullable(typeInfo))
    {
        /* From ECMA-335, I.8.2.4 Boxing and unboxing of values:
 
            All value types have an operation called box. Boxing a value of any value type produces its boxed value;
            i.e., a value of the corresponding boxed type containing a bitwise copy of the original value. If the
            value type is a nullable type�defined as an instantiation of the value type System.Nullable<T> � the result
            is a null reference or bitwise copy of its Value property of type T, depending on its HasValue property
            (false and true, respectively).
        */
 
        typeInfo = Class::GetNullableArgument(typeInfo);
        Class::Init(typeInfo);
        bool hasValue = *reinterpret_cast<bool*>(static_cast<uint8_t*>(val) + typeInfo->instance_size - sizeof(Il2CppObject));
 
        if (!hasValue)
            return NULL;
    }
 
    size_t size = Class::GetInstanceSize(typeInfo);
    Il2CppObject* obj = Object::New(typeInfo);
 
    size = size - sizeof(Il2CppObject);
 
    memcpy(((char*)obj) + sizeof(Il2CppObject), val, size);
    return obj;
}

Finally, let’s look at the IL2CPP output for unboxing:

extern "C"  int32_t TestClass_Unboxing_m2307370051 (RuntimeObject * __this /* static, unused */, RuntimeObject * ___x0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_Unboxing_m2307370051_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		RuntimeObject * L_0 = ___x0;
		return ((*(int32_t*)((int32_t*)UnBox(L_0, Int32_t2950945753_il2cpp_TypeInfo_var))));
	}
}

Again we have method initialization overhead due to using unboxing. The bulk of the work is performed by UnBox:

inline void* UnBox(RuntimeObject* obj, RuntimeClass* expectedBoxedClass)
{
    NullCheck(obj);
 
    if (obj->klass->element_class == expectedBoxedClass->element_class)
        return il2cpp::vm::Object::Unbox(obj);
 
    RaiseInvalidCastException(obj, expectedBoxedClass);
    return NULL;
}

Similar to virtual method calls, this function reads obj->klass->element_class and expectedBoxedClass->element_class which are probably both cache misses as they’re unlikely to have been in a recently-accessed cache line. The source code for Object::Unbox is unavailable, but it probably executes a lot faster than boxing as the boxed class type should be able to directly return the value type it contains. Update: the source code for libil2cpp.a is available in the Unity installation directory. Here’s this function:

void* Object::Unbox(Il2CppObject* obj)
{
    void* val = (void*)(((char*)obj) + sizeof(Il2CppObject));
    return val;
}

In the case of unboxing to a type that doesn’t match the value type contained in the boxed class type, an InvalidCastException is thrown. This should be exceedingly rare and the performance, if one is thrown, is likely the least important effect to be worried about, so we’ll skip the analysis.

Conclusion

Calling non-virtual functions is as cheap as can be. Other than a null check which can be removed, there’s no overhead at all from IL2CPP. Strive to use these wherever possible when performance is important.

Virtual functions are quite a bit more expensive. Cache misses and branch mispredictions can really slow these down on top of the intrinsic slowness of indirect function calls. Avoid these where performance is important.

Interface functions are even more expensive than ordinary virtual functions for the same reasons. The addition of method initialization overhead really hurts their performance. Avoid these too in performance-critical code.

Invoking delegates is the slowest way to call a function, even more so than calling interface methods. These should really be avoided anytime performance is a concern.

Boxing causes GC allocations, so it should be avoided anyhow to prevent the GC from running later on and causing a frame hitch and fragmenting managed heap memory. Beyond this, the IL2CPP overhead for boxing and, later on, unboxing further slows these processes down. These too should be avoided when performance matters.