The developer experience of tackling PHP internals

Inspired by Python’s functools.partial, I wanted to bring partial function application into PHP. This is already supported by Haskell, and other languages as well. It is similar to the concept of Currying. Partial function applications are a very powerful concept, and as such, it provides a new way to build abstractions.

Recently, I decided to spend some time working with the Zend API. In this blogpost, I will document my journey.


Userland (PHP)

The easiest way is to start in the userland. I wrote the function cufp – short for call_user_func_partial, following the notation of call_user_func:

function cufp( $fn, ...$args ) {
	return function( ...$x ) use ( $fn, $args ) {
		return call_user_func( $fn, ...array_merge( $args, $x ) );
	};
}

Here are a few examples of how we can use it:

function add3( $a, $b, $c ) {
	return $a + $b + $c;
}

$add3_partial = cufp( 'add3', 1, 2 );
var_dump( $add3_partial( 3 ) );
$add3_partial_2 = cufp( 'add3', 2 );
var_dump( $add3_partial_2( 3, 4 ) );
$add3_partial_3 = cufp( 'add3', 3, 4, 5 );
var_dump( $add3_partial_3() );

Neat. OK, now we have a good idea of what the function is like. Let’s try to implement this function in PHP internally, at the C level using the Zend API.

Internals (Zend API)

I already built a few PHP extensions in the past, and have even made small contributions to php-src. To start building an extension, I cloned wpboost (my most recent WIP extension), and started from there.

To get a basic understanding of PHP internals, a good starting (yet unfinished/outdated) reading is phpinternalsbook.com. It contains explanations of concepts such as the tagged union zval – one of the PHP’s important structures for representing values. Besides zval, zend_types.h defines other important structures as well: zend_object, zend_resource, useful macros such as Z_TYPE_P, IS_STRING_EX, Z_STRVAL, etc.

In any case, there is no single place to learn allthethings PHP internals. There will be a lot of small facts that I have found and will share here. One of the things I disliked the most was the lack of documentation/design decisions, so you kind of have to reverse-engineer through the source code, looking at other examples or trying to figure out the structures. That is, used a lot of php_printf and php_var_dump 🙂

Closures

In the userland code, if you var_dump( $add_3_partial ); you will see it returns a Closure. This naturally led me to look at zend_closures.h. I came up with a code similar to this one:

static ZEND_NAMED_FUNCTION(partial_scope) {
	// ...
	RETURN_TRUE; // as defined in zend_API.h
}
PHP_FUNCTION(cufp) {
	// ...
	zval f;
	zend_function zf;
	zend_string *fn_name = zend_string_init("closure", 
  sizeof("closure")-1, 0);
	zf.internal_function.function_name = fn_name;
	zf.internal_function.type = ZEND_INTERNAL_FUNCTION;
	zf.internal_function.arg_info = NULL;
	zf.internal_function.num_args = 0;
	zf.internal_function.required_num_args = 0;
	zf.internal_function.prototype = NULL;
	zf.internal_function.scope = NULL;
	zf.internal_function.handler = partial_scope;
	zend_create_closure(&f, &zf, NULL, NULL, NULL);
	// ...
	RETURN_ZVAL(&f, 1, 0);
}

The original idea was to create a closure using zend_create_closure and then return it to userland. Now when I was doing cufp() from userland it returned a closure. I could also do cufp()(), and it would return true, as expected, by partial_scope.

But, how do we bring variables into that closure’s scope? At first looks, seems like the third argument of zend_create_closure (scope) will do the trick. At this point, I needed to introduce myself to the zend_class_entry structure. Per PHP’s internals book:

Class entries contain a large amount of information, including the class methods and static properties as well as various handlers, in particular a handler for creating objects from the class.

Essentially, the userland’s class keyword is the internals’ zend_class_entry structure. But, it seemed odd to just create a fake “dummy class” like this on the fly. I also couldn’t find any property in that struct that can be used to pass information to the function.

Another thing I thought I could use is zend_closure_bind_var_ex. But since we use ZEND_INTERNAL_FUNCTION, this won’t work as the property static_variables_part (which is used by bind) is only set for ZEND_USER_FUNCTION. The definition for op_array is also involved here.

op_array is short for “operation array,” and it represents a script or a function in its compiled form. When you write PHP code, it goes through a compilation process that turns your human-readable code into an intermediate representation, which is stored in an op_array. This intermediate representation consists of a series of opcodes (operations), and it is what the PHP interpreter executes.

ChatGPT

I didn’t like the approach, because moving an internal function to a user function means we’re exposing unnecessary functions to the user. In any case, I decided to give it a try:

But now, I recalled from zend_closures.h, we can bind a var, but how can we retrieve it within the closure? That API seems limiting. Or maybe there is a function we could use that is defined elsewhere?

And, if things aren’t tricky enough at this point, here’s another one: there’s no 1:1 correspondence to filenames .c and .h. For example, zend_set_local_var is defined in zend_API.h but its definition is not in zend_API.c, rather, zend_execute_API.c.

So, the initial idea was to put variables in the scope of the closure, and I couldn’t find an easy way to do that. I read somewhere that it involves playing with call frames and execution contexts, so I decided to not go too deep into that rabbit hole. At this point, I decided to try a different approach, back to zend_class_entry

A Closure is just a class instance, after all…

With this approach, the idea is for the userland code $q = cufp(fn($x, $y, $z) => $x + $y + $z, 1); to instantiate $q to an object of PartialFunc (a class we get to define), where PartialFunc contains an __invoke method, making the class’ instance callable. (Think new X()()).

The Simple classes explanation from phpinternals.com provided a good starting point.

// Define a custom closure class entry
zend_class_entry *partialfunc_ce;

PHP_METHOD(PartialFunc, __invoke) /* {{{ */
{
	// 1. this function accepts varargs `...$args_closure`
	// 2. since it's a method within `PartialFunc`,
	// we get access to the previous `$fn` and `...$args`
	// 3. $args = array_merge( $args, $args_closure );
	// 4. call $fn($args) and return the value
}

ZEND_BEGIN_ARG_INFO_EX(arginfo_void, 0, 0, 0)
ZEND_END_ARG_INFO()
const zend_function_entry partialfunc_functions[] = {
	PHP_ME(PartialFunc, __invoke, arginfo_void, ZEND_ACC_PUBLIC)
	PHP_FE_END
};

PHP_MINIT_FUNCTION(functools)
{
	zend_class_entry tmp_ce;
	INIT_CLASS_ENTRY(tmp_ce, "PartialFunc", partialfunc_functions);
	partialfunc_ce = zend_register_internal_class(&tmp_ce);
	zend_declare_property_null(partialfunc_ce, "fn", sizeof("fn") - 1, ZEND_ACC_PUBLIC);
	zend_declare_property_null(partialfunc_ce, "params", sizeof("params") - 1, ZEND_ACC_PUBLIC);
	return SUCCESS;
}

PHP_FUNCTION(cufp)
{
	// 1. a user calls this function with two params: a function
	// `$fn` and args `...$args`
	// 2. further, create an instance of `PartialFunc`, and
	// sets the corresponding attributes to that instance
	// 3. finally, return the instance
}

That’s a neat outline! Let’s start figuring the details…

What’s a cufp?

So, as we said, this function accepts a callable variable and a variable number of arguments. Within php-src, we can use the macros Z_PARAM_ZVAL and Z_PARAM_VARIADIC to parse the arguments.

PHP_FUNCTION(cufp)
{
	zval *fn;
	zval *params;
	uint32_t num_params;
	zend_array *array_params;

	ZEND_PARSE_PARAMETERS_START(1, -1)
		Z_PARAM_ZVAL(fn)
		Z_PARAM_VARIADIC('+', params, num_params)
	ZEND_PARSE_PARAMETERS_END();
//...

Further, params is a C array (we get the params by php_var_dump(params), php_var_dump(params + 1), and so on). Let’s convert that to a PHP array:

	array_params = zend_new_array(num_params);
	for (int i = 0; i < num_params; i++) {
		zend_hash_next_index_insert_new(array_params, params + i);
	}

Finally, we want to create an instance of our class, using object_init_ex, and pass the properties to this instance. After some trial and error, I found out that add_property_zval_ex can be used for that (apparently, there’s also zend_std_write_property but couldn’t make that work):

	zval partialfunc_obj;
	object_init_ex(&partialfunc_obj, partialfunc_ce);

	zval array_params_zval;
	ZVAL_ARR(&array_params_zval, array_params);

	add_property_zval_ex(&partialfunc_obj, "fn", sizeof("fn") - 1, fn);
	add_property_zval_ex(&partialfunc_obj, "params", sizeof("params") - 1, &array_params_zval);

	RETURN_ZVAL(&partialfunc_obj, 1, 0);
}

Note that fn is a function, and in the PHP internals world, these usually get parsed with Z_PARAM_FUNC_EX. However, we needed to use Z_PARAM_ZVAL here so that we could pass it easily to the property within the class instance, using add_property_zval_ex.

This about wraps the definition of cufp.

What’s a __invoke?

Re-iterating, __invoke is a method within an instance that has access to the instance’s properties. Following the outlined plan, we start:

PHP_METHOD(PartialFunc, __invoke) /* {{{ */
{
	zval *params;
	uint32_t num_params;
	zend_array *array_params;
	zval *this = getThis();

	zend_fcall_info       fci;
	zend_fcall_info_cache fcc;

	ZEND_PARSE_PARAMETERS_START(0, -1)
		Z_PARAM_VARIADIC('+', params, num_params)
	ZEND_PARSE_PARAMETERS_END();

	array_params = zend_new_array(num_params);
	for (int i = 0; i < num_params; i++) {
		zend_hash_next_index_insert_new(array_params, params + i);
	}
	//...

Nothing new here, initialize a few variables and do the same argument parsing as before. After searching, I found out getThis() which is cool – returns an instance of the object when called within a PHP_METHOD. The next step is to figure out how to retrieve the instance’s properties.

While there is a function for adding properties add_property_zval_ex, there is no get_property_zval_ex. After digging into the code, I found out that add_property_zval_ex uses Z_OBJ_HANDLER_P(arg, write_property), so maybe we can just use Z_OBJ_HANDLER_P(arg, read_property).

	zend_string *fn_str = zend_string_init("fn", sizeof("fn")-1, 0);
	zend_string *params_str = zend_string_init("params", sizeof("params")-1, 0);

	zval *obj_fn = Z_OBJ_HANDLER_P(this, read_property)(Z_OBJ_P(this), fn_str, BP_VAR_R, NULL, NULL);
	zval *obj_params = Z_OBJ_HANDLER_P(this, read_property)(Z_OBJ_P(this), params_str, BP_VAR_R, NULL, NULL);

	zend_string_release(fn_str);
	zend_string_release(params_str);

So far so good. Next, we need to merge the params arrays, and that is easy – just use php_array_merge.

	php_array_merge(Z_ARRVAL_P(obj_params), array_params);

Now, recall the Z_PARAM_FUNC_EX discussion: PHP uses this to parse function parameters internally. Now we have a different problem – we want to parse a given zval (obj_fn) to a function parameter, not PHP_FUNCTION‘s args. To understand how it converts these params, digging into the macro revealed that it uses zend_parse_arg_func.

	zend_parse_arg_func(obj_fn, &fci, &fcc, 0, 0);

Perfect! Now, we can just call a function using the good old zend_call_function. Basically, fci and fcc are function contexts and that’s what zend_call_function needs, which we already have:

	zval retval;
	fci.named_params = Z_ARRVAL_P(obj_params);
	fci.retval = &retval;

	if (zend_call_function(&fci, &fcc) == SUCCESS && Z_TYPE(retval) != IS_UNDEF) {
		if (Z_ISREF(retval)) {
			zend_unwrap_reference(&retval);
		}
		ZVAL_COPY_VALUE(return_value, &retval);
	}
}

A working implementation?!

Running the original PHP example code, it seems like the extension defining this behaviour works properly. There are very likely memory leaks, but at this point, I was already tired of polishing it.

Conclusion

With this blogpost, I hope I provided a glimpse of what it is to work with PHP internals.

The php-src codebase is a bit chaotic. There are inconsistent function declarations, a lack of function documentation, and in general lack of general architectural documentation. Some APIs are missing, and some aren’t. I believe all this inconsistency is due to historical context – e.g., closures were added only at a later point in PHP, and had there been an initial design goal/plan, it would have made things much easier for everybody.

So, to summarize, Zend API is tricky to navigate and takes a lot of patience and research. However, the time spent understanding a programming language’s source code is crucial if you want to get a deeper understanding of the programming language.

One thought on “The developer experience of tackling PHP internals

Leave a comment