Understand Basic Lifetime Annotation in Rust

#rust

Lifetime annotation in rust is a relatively unique concept and it is hard to understand (at least for me). Spent some time on it and just want to share my understanding.

TL;DR

Every reference should have a lifetime.
The compiler wants to know the lifetime of every reference.
When the returned value is a reference, the compiler may fail to know its lifetime.
So, we should specify it.

Ok, let's start with the example from The Book.

fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

This code will not be compiled. The error message is as follwoing:

error[E0106]: missing lifetime specifier
 --> src/main.rs:1:33
  |
1 | fn longest(x: &str, y: &str) -> &str {
  |                                 ^ expected lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`

well, according to The Book, the reason is:

Rust can’t tell whether the reference being returned refers to x or y

Based on this statement, I started to think, what if I did not return things relates to x and y? So I wrote this code:

fn longest(x: &str, d: & str) -> & str {
    "ddd"
}

This time I got the same error and help message as before. So I started to think what is behind the help message. Then I noticed that at the very beginning, it suggests

this function's return type contains a borrowed value ...

So maybe this should be the main reason instead of the x and y stuff 🤔? Ok, it suggested borrowed value, and yes I intended to return one borrowed value &str. But where could the borrowed value come from and what lifetime should the returned value have? Remember according to The Book, every reference should have a lifetime.

I guess from the compiler's perspective, this borrowed value can mainly from two sources. The first one is the parameter this function got, and the second one is any value created within the function (surely it could also be from global variables, e.g. constant). So what is the lifetime of these situations? Let's investigate them one by one. And we start with return value created within the function.

If we want to return a reference to inner scope variables, the reference should have a long lifetime. Otherwise, the reference would be dropped once the function scope is ended. This reminds me of the static lifetime specifier (surely it can also be 'a ). According to documentation, this specifier means the variable has the life time as long as the whole project. So I changed the code to

fn longest(x: &str, d: & str) -> &'static str {
    "ddd"
}

Luckily this program compiled and returned me the expected result 🆒. But what if I want to return a reference to a String ? So I changed the code to

fn longest(x: &str, d: & str) -> &'static str {
    &String::from("ddd")
}

This time I got some different error message, which is

2 |     &String::from("ddd")
  |     ^-------------------
  |     ||
  |     |temporary value created here
  |     returns a reference to data owned by the current function

This error message is easy to understand, the created memory for this String struct in heap would be dropped once the function's scope is ended. In that case, the reference would become one dangling reference, which violates the compiler's rule.

Ok now let's have a look at get reference from parameters. Again, let's think about what compiler would think when it saw the signature. It saw two parameters, but the question is that these two parameters might have different lifetime. In that case, which lifetime should the compiler use to create the reference? We don't know, neither the compiler, and that is why it complains. In that case, we should specify the lifetime of them. So we can just paste the code provided from The Book.

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

Based on the code above we can see that we are telling the compiler that both x, y, and the returned value have a lifetime of 'a. 'a is just one generic statement, it tells the lifetime of x and y. In the meantime, it also tells the compiler that the returned value should live at least as long as lifetime a. Hence, the compiler would be happy. It got all the information it needs.

One thing should be noticed is that we usually add 'a to all parameter, which is not necessary. In the above case, it was because the returned value could either be x or y, no one knows it until runtime. But if we are sure about which one would be returned, say x, we can just specify the lifetime of x

fn longest<'a>(x: &'a str, y: str) -> &'a str {
    x
}

We can also specify a different lifetime for the different parameter, but it should be noted that the returned value's lifetime can only be one.

fn longest<'a, 'b>(x: &'a str, y: &'b str) -> &'b str {
    y
}

So, in conclusion, we should inform the compiler every reference's lifetime, otherwise, it will complain. Although some times the compiler can infer it, sometimes we should tell it explicitly.

Rust Tutorial - Lifetime Specifiers Explained

Rust Lifetimes

Validating References with Lifetimes