Compile time string literals processing, but why?

Since the addition of constexpr in C++11 we can do many things at compile time. One of those things is processing string literals.

The C++ standard has a chapter on string literals (5.13.5). The, for this post interesting part is: evaluating a string-literal results in a string literal object with static storage duration.

Finding strings in your binary

If you declare a variable like const char* str = "Hello World";, the string "Hello World" will be in your binary.

Linux, MacOS and Windows have a strings command. (For Windows you need to install the Sysinternals Suite)

Running strings [your_binary] will list all the strings in your binary. "Hello World" should be among them. (Given you do something with it, otherwise the compiler might optimize it away)

You can keep this in mind if you experiment with the code below, and want to verify the results.

Using string literals at compile-time

In C++11, constexpr functions were limited and had a lot of restrictions. One of those restrictions was that you could not use if statements. This was relaxed in C++14, and today with C++20 or even 23 you can use almost everything in a constexpr function.

Example (C++11)

Let’s look at how a constexpr function could look like in C++11 that tells us the length of a string literal at compile time.

#include <cstddef>

constexpr size_t my_str_len(const char* str) {
  return *str ? 1 + my_str_len(str + 1) : 0;
}

Example (C++14 or later)

Luckily, today we are not longer restricted to return statements and recursion for constexpr functions.

constexpr size_t my_str_len(const char* str) {
  size_t len = 0;
  while (*(str + len) != '\0') {
    ++len;
  }
  return len;
}

Using constexpr function

Either way, it works.

int main() {
  constexpr const char* s = "Hello World";
  static_assert(my_str_len(s) == 11);
  return 0;
}

You can test and change the static assert to another value than 11. The code will not compile.

The C++14 example is of course nicer to write and read.

With if, `while and `for` statements, it becomes more convenient to do more. Like finding a character in a string literal, finding the last occurrence, or the Nth occurrence. We will come to that.

A string literals is an Array

In the previous example, the "Hello World" can be written in two ways.

constexpr const char* s = "Hello World";

or

constexpr const char s[12] = "Hello World";

Yes, C arrays, not very nice to work with, and that they are pointers …​ is a part for another post.
For now it’s important to remember that a string literal is an array.

The size of the Hello World array is 12, since the compiler adds a null terminator at the end of the string literal.

The static_assert will work with both versions.

static_assert(my_str_len(s) == 11);

Interesting, but …​ why?

So, we can do a lot of things with string literals at compile time. But why would we want to do that?

Well, we could deal with a problem of _FILE_ and/or std::source_location.
Both are here to give us information about the current file name, but it can give too much information.

The problem of _FILE_ and std::source_location

The C standard says

The following macro names shall be defined by the implementation:
FILE The presumed name of the current source file (a character string literal). The precise details of how the file name is represented (whether it’s a full path, a relative path, or just the file name) are implementation-defined.

I do not like the words implementation defined. Those are bad words.

In practice, they show what the compiler sees for the current translation unit. Whatever filename is passed to the compiler, it will be used. This can either be only a filename, a relative path, or a full path. Depending on your build system and the compiler.

With CMake as a build system, I always end up having the whole path of the translation unit for _FILE_. And this has some negative side effects. If the file name is used in log messages, it can leak unwanted information. And it can be annoying to read.

How std::source_location should work

If we can assume the compiler sees the full path of the translation unit, then we can assume that std::source_location will give us the full path of the current file.

Example:

/Users/a4z/work/cpp/project/src/part/one.cpp

It would be nice to make it configurable, how much of a path shall be returned, like

part/one.cpp

The whole prefix /Users/a4z/work/cpp/project/src is not needed, nor wanted.
In best case it’s useless, in worst case it’s annoying in log messages, or even leaks unwanted information.

Compiler flags can help

For GCC the -fmacro-prefix-map tells the compiler to replace a prefix of a path with something else. This can be used to replace /Users/a4z/work/cpp/project/src with ./. , for example
Then the file /Users/a4z/work/cpp/project/src/part/one.cpp will become ./part/one.cpp

The flags are compiler specific, and they are not portable. We depend on the compiler and the build system to do the right thing. And things can go wrong. I still prefer some compile time checks in the code, just to be sure.

And even if we do trust in the build system to not pass full paths to the compiler, it might be a good idea to have a compile time check in the code that this is the case. Since release builds might change without you noticing it.

Checking _FILE_ with code

The basic idea is simple:

  • Assume we always have the full path in _FILE_

  • Find the last Nth occurrence of / (on Windows \\) in the string literal

  • Get the length of the string literal that we actually want

  • Create a new string literal (Array) with the wanted length

  • Copy the wanted part of the string literal into the new string literal

  • Do all that in constexpr context so the compiler throws away _FILE_ and keeps only the new string literal.

Example

This code is not the nicest one and is quite dated, but it demonstrates the idea.
Please feel free to explore what can be done with C++20 or newer, and let me know in the comments.

We take _FILE_, and extract folder/file.ext from it.

// -- library part

#include <cstddef>
#include <limits>
namespace a4z {

  constexpr std::size_t cstr_len(const char* str) {
    std::size_t len = 0;
    while (*(str + len) != '\0') {
      ++len;
    }
    return len;
  }

  constexpr auto npos = std::numeric_limits<std::size_t>::max();

  template <std::size_t N>
  constexpr std::size_t find_nth_r_occurrence(const char (&str)[N],
                                              char ch,
                                              std::size_t n) {
    std::size_t count = 0;
    for (std::size_t i = N; i-- > 0;) {
      if (str[i] == ch) {
        ++count;
        if (count == n) {
          return i;
        }
      }
    }
    return npos;  // Not found
  }

  constexpr bool on_windows() {
#ifdef _WIN32
    return true;
#else
    return false;
#endif
  }

  template <std::size_t N>
  struct astr {
    char data[N] = {0};

    constexpr astr(const char (&arr)[N]) noexcept {
      for (std::size_t i = 0; i < N; ++i) {
        data[i] = *(arr + i);
      }
    }

    constexpr const char* c_str() const noexcept { return &data[0]; }

    constexpr std::size_t size() const noexcept { return N - 1; }
  };

  constexpr char slash = on_windows() ? '\\' : '/';

#define a4z_file_name                                                          \
  []() consteval {                                                             \
    constexpr const char* str = __FILE__;                                      \
    constexpr auto len = a4z::cstr_len(str);                                   \
    constexpr auto start{a4z::find_nth_r_occurrence(__FILE__, a4z::slash, 2)}; \
    static_assert(start != a4z::npos);                                         \
    constexpr std::size_t astr_len = len - start;                              \
    char data[astr_len];                                                       \
    for (std::size_t i = 0; i < astr_len; ++i) {                               \
      data[i] = *(str + start + i + 1);                                        \
    }                                                                          \
    return a4z::astr<astr_len>{data};                                          \
  }

}  // namespace a4z

// -- Application part

#include <cstdio>

int main() {

   fprintf(stdout, "%s\n", a4z_file_name().c_str());
   return a4z_file_name().size();
}

This code is not pretty. I wrote it while back. It might be possible to do it nicer with C++20 or newer. But it demonstrates the idea. And it works.

The result

It’s possible to use code to get the filename of the current translation unit, without the full path. And to end up with a predictable string literal. This can be very convenient.

Even if we trust the build system to not pass full paths, it still possible to use static_assert to verify FILE does not use a too long path. This check would also be done with compile time string processing and the code is very similar.

Other things that can be done

In this talk I mention a few hacks that can be done with string literals at compile time.

One is to (ab)use _PRETTY_FUNCTION_ ( _FUNCSIG_ on MSVC) to get the name of a type at compile time. Which is an ugly hack required due to the lack of reflection in C++.

magic_enum is a very popular library that does similar things, and much more.

Some less hacky things might be the compile time string checks libfmt or std::format do.

Summary

There are some, more or less, useful things that can be done with string literals at compile time. Wether you find it useful or not depends on the use case and what you want to achieve.

In any case, if you want to start learning about compile time programming, working with string literals can be a good start. It’s a good playground to apply algorithms, and the results are human-readable and verifiable.