crwen's 博客 - Rustonomicon Lifetimes

Lifetimes are named regions of code that a reference must be valid for. Those regions may be fairly complex, as they correspond to paths of execution in the program.

生命周期是一段代码范围，在这个范围内，引用必须是有效的。有时这个 "代码范围" 等同于变量作用域，但是有时不等同于作用域，因此不能将生命周期与作用域相混淆。比如下面这段函数

fn main() {
    let mut data = [1, 2, 3];

    let mut x = &mut data[0];

    for item in data.iter_mut().skip(1) {
        x = item;
    }
    x = &mut data[0];
    println!("{}", x);
}

在 Rust 中，每一个 let 语句都会隐式地引入一个作用域，比如下面这段代码解语法糖后的形式如下所示：

let x = 0;
let z;
let y = &x;
z = y;

'a: {
    let x: i32 = 0;
    'b: {
        let z: &'b i32;
        'c: {
            // Must use 'b here because the reference to x is
            // being passed to the scope 'b.
            let y: &'b i32 = &'b x;
            z = y;
        }
    }
}

下面将通过一些例子来探讨生命周期

Example

References that outlive referents

考虑下面这段程序：

fn as_str<'a>(data: &'a u32) -> &'a str {
    'b: {
        let s = format!("{}", data);
        return &'a s;
    }
}

这个函数的签名要求返回一个引用，并且这个引用至少活的跟函数一样长('a)。但实际上它返回的引用并不会活的比这个函数长('b 的范围比 'a 小 )，所以编译器会给出错误警告。

这么说可能不是很好理解，我们通过一个使用示例来加深对其的理解

fn main() {
    'c: {
        let x: u32 = 0;
        'd: {
            // An anonymous scope is introduced because the borrow does not
            // need to last for the whole scope x is valid for. The return
            // of as_str must find a str somewhere before this function
            // call. Obviously not happening.
            println!("{}", as_str::<'d>(&'d x));
        }
    }
}

我们调用 as_str 这个函数，期待它返回一个有效的引用，其生命周期为 'd。然而这个引用的本体在函数结束前就失效了(也就是说其生命周期小于 'd)，这就造成了悬垂引用。

Aliasing a mutable reference

let mut data = vec![1, 2, 3];
let x = &data[0];
data.push(4);
println!("{}", x);

这段程序解语法糖后如下：

'a: {
    let mut data: Vec<i32> = vec![1, 2, 3];
    'b: {
        // 'b is as big as we need this borrow to be
        // (just need to get to `println!`)
        let x: &'b i32 = Index::index::<'b>(&'b data, 0);
        'c: {
            // Temporary scope because we don't need the
            // &mut to last any longer.
            Vec::push(&'c mut data, 4);
        }
        println!("{}", x);
    }
}

我们可以看到，x 必须在 'b 的范围内有效，但是在 'c 的范围中，我们使用了一个 data 的可变引用，但是 x 是一个对 data 的不可变引用

你可能会问，x 不应该是对 data 中元素的引用吗？但是 Rust 并不知道 x 应该是 data 的子元素，它只知道 x 因该在 'b 的范围内有效。我们再来看一下 Index::index() 的函数签名

pub fn index(&self, index: Idx) -> &Self::Output

pub fn index(&'a self, index: Idx) -> &'a Self::Output

Index::index() 要求返回一个值，其生命周期应该与传入的 self 参数的生命周期一致，也就是说 &self(&data) 的生命周期至少为 'b。现在发现问题了吧，同时存在了对同一个内存区域的可变引用和不可变引用

这个程序对于我们来说是正确的语义，然而 Rust 却拒绝了它。因此可见 Rust 生命周期系统还是太粗糙了，有时会拒绝正确的程序，但正因为如此，才说 Rust 让编写错误的代码变得更困难

The area covered by a lifetime

let mut data = vec![1, 2, 3];
let x = &data[0];
println!("{}", x);
// This is OK, x is no longer needed
data.push(4);

这个程序与上面的程序很像，但是将打印语句放到了 data.push(4) 语句上面就可以通过编译了。这是因为生命周期只要求引用从创建的地方开始，直到最后一次使用的地方为止是有效的，打印语句接手后 x 就没用了，也就是说 data 的不可变引用也可以失效了，所以后面调用 data.push(4) 并不会报错

但是，如果为 x 实现了 Drop trait，这个程序还是不能通过编译。但是如果在 data.push() 前调用 drop 就又能通过编译了

#[derive(Debug)]
struct X<'a>(&'a i32);

impl Drop for X<'_> {
    fn drop(&mut self) {}
}

fn main() {
    let mut data = vec![1, 2, 3];
    // let x = &data[0];
    let x = X(&data[0]);
    println!("{:?}", x);
    // drop(x);
    data.push(4); // error
    // Here, the destructor is run and therefore this'll fail to compile.
}

Lifetime Elision In Function

在 Rust 的函数中每个泛型参数都会会引入一个生命周期，有时生命周期标注可以省略，但是有时不能省略。下面是函数声明周期省略的规则：

每个 input position 都有一个生命周期参数
如果只有一个 input position，那么可以省略，并且所有的 output lifetimes 都默认与 input lifetimes 相同
如果有多个 input lifetime positions，并且其中一个是 &self 或 &mut self，那么所有 output lifetime 都默认与 output lifetimes 相同
否则，必须显示地标注生命周期参数

下面是一些生命周期省略的例子

fn print(s: &str);                                      // elided
fn print<'a>(s: &'a str);                               // expanded

fn debug(lvl: usize, s: &str);                          // elided
fn debug<'a>(lvl: usize, s: &'a str);                   // expanded

fn substr(s: &str, until: usize) -> &str;               // elided
fn substr<'a>(s: &'a str, until: usize) -> &'a str;     // expanded

fn get_str() -> &str;                                   // ILLEGAL

fn frob(s: &str, t: &str) -> &str;                      // ILLEGAL

fn get_mut(&mut self) -> &mut T;                        // elided
fn get_mut<'a>(&'a mut self) -> &'a mut T;              // expanded

fn args<T: ToCStr>(&mut self, args: &[T]) -> &mut Command                  // elided
fn args<'a, 'b, T: ToCStr>(&'a mut self, args: &'b [T]) -> &'a mut Command // expanded

fn new(buf: &mut [u8]) -> BufWriter;                    // elided
fn new(buf: &mut [u8]) -> BufWriter<'_>;                // elided (with `rust_2018_idioms`)
fn new<'a>(buf: &'a mut [u8]) -> BufWriter<'a>          // expanded

Unbounded Lifetimes

Unsafe 的代码(比如解引用 raw pointer)通常会产生一个不受约束的引用或声明周期，比如

fn get_str<'a>(s: *const String) -> &'a str {
    unsafe { &*s }
}

如果省略声明周期标注，那么输出生命周期应该是什么？根据生命周期省略规则，输出生命周期应该为输入生命周期，但是这个函数只接收一个 raw pointer，没有输入生命周期，所以我们必须显示地为其指定生命周期，但是我们为其显示指定的生命周期是不受约束的(我们没有任何手段来控制输出生命周期)

比如我们尝试使用这个函数:

fn run() {
    let soon_dropped = String::from("hello");
    let dangling = get_str(&soon_dropped);
    drop(soon_dropped);
    println!("Invalid str: {}", dangling); // Invalid str: gӚ_`
}

问题来了，dangling 的生命周期范围应该为多大呢？与 &soon_dropped 一致？但是它被转换为 raw pointer 传递到了 get_str 中。实际上 dangling 的生命周期不受约束，所以即便在 drop(soon_dropped) 后使用 dangling 编译也不会报错。因此，在使用 unsafe 代码是一定要非常小心

参考资料： Rustonomicon