type-safe parametric polymorphism

Among the other new proposals for C23 that WG14 received are two brilliant gems by Alex Gilding

The first is probably what you’d expect if you know the C++ feature and how it started (I think in C++11), namely a possibility to specify constants for any type (if used to declare an object) or to have simple function calls that can be forced to be evaluated at compile time. There are still some rough edges to cut (storage class, linkage) but basically, if Wg14 weren’t so predictably unpredictable, this should be a homerun.

The other one is on a whole different level, eye opening and very, very promissing. Its main idea is to add annotations to APIs to avoid breaking ABIs and code duplication at the same time.

Let’s take a first example what the features that are proposed can do. For the remainder of this post I will be assuming that we have a C23 compiler that implements the features.

void * memcpy(void * restrict s1, const void *restrict s2, size_t n);

This function uses the catch-all type void to indicate that the pointers that it receives as arguments may point to any bytes, as long as for both there are at least n bytes available, the two byte arrays don’t overlap, and as long as the object pointed-to by s1 is mutable.

Now in 99% of the cases the types of the objects that are pointed-to are probably the same and the byte values that are copied from s2 into s1 will combine to a sensible object in the target. Problems occur if the pointed to objects have different size or if they even have different type. Even if float has 32 bit the following code is erroneous, because it overwrites an object with floating point type by a representation of an integer type, and the resulting representation might not be valid.

uint32_t a = 4288912;
float x;
memcpy(&x, &a, sizeof(x)); // semantic error, don't do this!
printf("an unsigned as a float: %g", x); // may trap

The remedy using features from the above papers looks as follows

constexpr void [[bind_var(A)]]* (*cpy_typed)(void [[bind_var(A)]]* restrict s1, const void [[bind_var(A)]]*restrict s2, size_t n) = memcpy;
...
uint32_t a;
float x = 45.9;
cpy_typed(&a, &x, sizeof(a)); // Contraint violation, diagnosis required!

So what had been undefined behavior above, now becomes a hard error (constraint violation as we call it in C), but the function that is used underneath, memcpy, is still exactly the same.

The trick is to annotate the void* pointers of the orginal interface with attributes a new C23 feature (borrowed from C++) that allows to annotate precice spots of the syntax. Here it is three times the same attribute [[bind_var(A)]], indicating that the code that uses this interface is expected to pass in two pointers that have the same base type named A (which ever that is) and also to return a pointer to an object of that same base type A.

Now all this specification is put on a function pointer cpy_typed that is constexpr and initialized to memcpy (scroll to the end of the line to see this.)

No extra code is generated, but type safety is improved!

In C23 with type-generic lambdas, if that is accepted, you could do something like this

#define cpy_typed(S1, S2, N)                            \
[](auto* s1, typeof(*s1) const* s2, size_t n) {         \
  return (typeof(*s1)*)memcpy(s1, s2, n); }(S1, S2, N); \
}(S1, S2, N)

Using such a lambda would provide the same type checks, but at the potential cost of efficiency and code size. We would be depending on the optimization capacities of the compiler, the call to memcpy would often have an additional indirection and converting the lambda to a function pointer could imply the generation of an additional stub each time.

The new feature becomes even more interesting when we look into functions that receive a callback to perform a task that depends on such a function. Again, let’s look at a function of the C library

void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));

int comp_double(void const* x, void const* y) {
    double const* X = x; double const* Y;
    return (*X < *Y) ? -1 : (*X == *Y ? 0 : +1);
}
...
double   dArr[5] = { 4.0, 2.2, 5.1, 6.3, 9.4, };
unsigned uArr[5] = { 4, 2, 5, 6, 9, };

qsort(dArr, 5, sizeof dArr[0], comp_double); // ok
qsort(uArr, 5, sizeof uArr[0], comp_double); // undefined

If you are not used to this kind of syntax, this is a function that sorts the array to which base points. That array is supposed to be composed of nmemb elements of size size and compar is a pointer to a comparison function that receives two pointers, compares the objects and returns values less, equal or greater then 0 if the first is smaller, equal or greater. The underlying type for base and the sort order that compar describes can be arbitrary.

Now, again C just puts the responsibility that all of this is consistent on your (the programmers) shoulders. For now there is no mechanism in C that could provide even the simplest consistency checks. On the other hand we gain one function, one single external symbol in the C library that is capable of doing a generic sort. Usually it is relatively efficient and small.

How this changes with the proposed papers? First this could be augmented in a similar way by adding attributes

constexpr void (*sort_typed)(void [[bind_var(A)]] *base, size_t nmemb, size_t size, int (*compar)(const void [[bind_var(A)]]*, const void [[bind_var(A)]]*)) = qsort;

So now this is a function pointer that points to the qsort function, but that ensures that it can only be called with consistent types.

int comp_double(void const [[bind_type(double)]]* x, void const [[bind_type(double)]]* y) {
    double const* X = x; double const* Y;
    return (*X < *Y) ? -1 : (*X == *Y ? 0 : +1);
}
...
double   dArr[5] = { 4.0, 2.2, 5.1, 6.3, 9.4, };
unsigned uArr[5] = { 4, 2, 5, 6, 9, };

sort_typed(dArr, 5, sizeof dArr[0], comp_double); // ok
sort_typed(uArr, 5, sizeof uArr[0], comp_double); // hard error

So now the compiler knows that a function is expected that receives pointer the same pointer target type than dArr and uArr. For dArr this base type is double so everything works out fine. For uArr it would be expecting unsigned and so it is able to alert the programmer that the specific call is inconsistent.

Here we have the second attribute syntax [[bind_type(double)]] that is introduced by the paper. It ensures that comp_double can only be called with pointers coming from double. For the function definition it also guarantees that the parameters can only be silently converted to pointers to double and nothing else.

So instead of blowing up the C library by an unbounded number of functions (as for example templates in C++ or type-specific functions and type-generic interfaces to them in C) Alex’ trick helps us to maintain exactly the one function that has been in the C library since the epoch (no ABI change), and to augment the capacity of the compiler to detect inconsistencies (a change in the API).

14 thoughts on “type-safe parametric polymorphism”

Dusan Jovanovic (DBJ) says:

October 18, 2021 at 09:03

All well and good. Contract programming, well and good. The syntax will be here an endless debate, endangering the good proposal itself. One might for example argue this is “cleaner and better”:

[[requires typeof(X) == typeof(Y) ]] int comp_double(void const * x, void const * y) { double const* X = x; double const* Y; return (X < *Y) ? -1 : (X == *Y ? 0 : +1); }

And. The undoing of C++ “requires” might be its arcane syntax. WG14 might learn from that WG21 mistake. Whatever faith is awaiting this good proposal.
1. Jens Gustedt says:
  
  October 18, 2021 at 09:19
  
  I don’t think that Alex is much attached to the syntax of the attribute itself, we’ll see. The proposed syntax for bind_var(A) reminds me a bit of Rust’s lifetime syntax 'a.
  1. Dusan Jovanovic (DBJ) says:
    
    October 18, 2021 at 10:23
    
    I hate to be explicit 🙂 Bjarne (in my opinion) has made the mistake of not “dumbing it down” for C++ “requires” syntax. Although the situation in the two “plays” is much different.
  2. John Viega says:
    
    October 30, 2021 at 05:55
    
    DBJ’s syntax is MUCH more lucid. Please get the syntax right… it’s too important to build it and then have people not use it.
    1. Jens Gustedt says:
      
      October 30, 2021 at 07:15
      
      Basically the only possible choice for syntax here is how these attributes are named. Then, valid attribute syntax is always of the form [[identifer( balanced-token-sequence)]]. That’s all we can do for C23. We are not inventing a new language here. This is C with all its baggage and slow process.
      
      And then there is also the visibility scope of parameters. So we will not have a feature that uses parameter names before they are syntactically introduced.
2. Thiago Adams (@thradams) says:
  
  October 18, 2021 at 12:38
  
  Interesting.. I was going to say something with contracts as well.
  The motivation is fantastic, I hope C standard goes in this direction.
  
  The problem with C++ contracts or these bind annotations in my view is that it is
  hard to imagine all possible uses and this may cause too many (and confusing)
  types of attributes. So I have a suggestion for the syntax and it also extends the
  possibilities making use of basic features.
  
  My suggestion is to add a compile-time contract for functions.
  
  Syntax could be something like:
  
  int comp_double(void const * x, void const * y)
  _Contract
  {
  _Static_assert(typeof(x) == typeof(double*), “x must be a pointer to double”);
  _Static_assert(typeof(x) == typeof(double*), “y must be a pointer to double”);
  }
  
  The important point is that x is not the type as seen by the function, but it is the type
  at the caller point.
  
  For instance:
  
  char * safe_itoa( int value, char *buffer, int radix )
  _Contract
  {
  _Static_assert(sizeof(buffer) >= 33, “buffer must be at least 33 chars”);
  }
  
  buffer variable inside safe_itoa is a pointer but for the contract evaluation
  buffer variable always refers to the variable at the caller side.
  
  char buffer[12];
  safe_itoa( 123, buffer, 10); //error : buffer must be at least 33 chars
  
  Initially _Contract statements could be _Static_asserts only but prepared in the future to
  be extended (like ifs) to anything that can be done in compile time.
  
  The interdependency of contracts (that is very interesting on this bind_var) is that
  can be a problem in this _Contract idea.
  
  So I think we should be able to test a contract of some function insider another contract.
  
  void qsort( void ptr,
  size_t count,
  size_t size,
  int (comp)(const void * a, const void * b) )
  _Contract
  {
  // I am calling the contract check of comp using the ptr variable
  _Static_assert(_Contract(comp(ptr, ptr)));
  };
  
  int comp_double(void const * x, void const * y) {
  double const* X = x; double const* Y;
  return (X < *Y) ? -1 : (X == Y ? 0 : +1);
  }
  _Contract
  {
  _Static_assert(typeof(x) == typeof(y), “x and y must be the same type”);
  _Static_assert(typeof(y) == typeof(double), “x and y must be a pointer to double”);
  }
  
  unsigned uArr[5] = { 4, 2, 5, 6, 9, };
  sort_typed(uArr, 5, sizeof uArr[0], comp_double); //error: “comp: x and y must be a pointer to double”
  1. Jens Gustedt says:
    
    October 18, 2021 at 13:08
    
    Several remarks. This is not my proposal. Then, we are closed for new proposals and some form of contract would certainly be seen as a new proposal. Also, we have not even yet formally agreed on typeof but using it to compare types, is completely out of reach. We have been able to add attributes to C23 and this is all we got. Alex’ paper manoeuvers very well in the concrete design space that we have.
    1. Dusan Jovanovic (DBJ) says:
      
      October 18, 2021 at 13:59
      
      I agree. Having typeof in the language is the “deal breaker”. (https://stackoverflow.com/a/12082050/10870835)
      
      I might be somewhat dumb as a compiler designer, but I might expect typeof could deliver the actual type from the dereferenced void *? As soon as I have written it, I am not hopeful.
      1. Jens Gustedt says:
        
        October 18, 2021 at 15:00
        
        That also would be completely different proposal.
  2. Dusan Jovanovic (DBJ) says:
    
    October 18, 2021 at 13:11
    
    Now WG14 and WG21, just have to agree on the same _Contract syntax, for the functions.
    
    The rest is history.
Tomasz Stanislawski says:

October 18, 2021 at 14:53

Thank you for the article. I have a little question. Are you sure that memcpy-ing between 32-bit integer and float is UB? Doing memcpy is accessing both L-values via a pointer to char. This operation is allowed as an explicit exception to “strict-aliasing rule”
Jens Gustedt says:

October 18, 2021 at 15:13

Thanks Tomasz, yes indeed I got it the wrong way around as always. Updated.
Tomasz Stanislawski says:

October 19, 2021 at 09:55

Could this machinery be used to have a type save version of malloc/calloc. To avoid errors like p = malloc(sizeof(T)) when the type of p is changed from T to something else. Would it be possible to have a safer realloc ?
1. Jens Gustedt says:
  
  October 19, 2021 at 11:40
  
  For malloc I don’t see much more possibilities than wrapping it in a macro of the form (typeof(TYPEOREXPR)*)malloc(sizeof(typeof(TYPEOREXPR))) and that doesn’t need that new feature. malloc and calloc themselves do not have pointer parameters, so there is nothing to be annotated, here.
  
  For realloc indeed that is different. We could annotate the void* pointers in the interface and that would provide a safer version of the function interface.