7

Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:

fn main() {
    let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();

    for t in v.iter_mut() {
        if (t.contains("$world")) {
            *t = &t.replace("$world", "Earth");
        }
    }

    println!("{:?}", &v);
}

But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?

2 Answers 2

9

Rust has exactly what you want in form of a Cow (Clone On Write) type.

use std::borrow::Cow;

fn main() {
    let mut v: Vec<_> = "Hello there $world!".split_whitespace()
                                             .map(|s| Cow::Borrowed(s))
                                             .collect();

    for t in v.iter_mut() {
        if t.contains("$world") {
            *t.to_mut() = t.replace("$world", "Earth");
        }
    }

    println!("{:?}", &v);
}

as @sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use

*t = Cow::Owned(t.replace("$world", "Earth"));

In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.

let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
    let p = pos + last_pos; // find always starts at last_pos
    last_pos = pos + 5;
    unsafe {
        let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
        s.remove(p); // remove $ sign
        for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
            *sc = c;
        }
    }
}

Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.

Sign up to request clarification or add additional context in comments.

3 Comments

The to_mut here only creates an unnecessary String value (involves heap memory allocation) which is immediately overwritten (involves deallocation). I'd change the line to *t = Cow::Owned(t.replace("$world", "Earth")); to avoid this overhead.
Your last example probably should have more warnings beyond "careful consideration" placed around it. It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes. It's definitely an optimization, but not a universally applicable one.
added more warnings and some bold text. I wonder if a PR adding a replace(&mut self, needle, value) function to the String struct would be accepted
8

std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.

use std::borrow::Cow;

fn main() {
    let mut v: Vec<Cow<'static, str>> = vec![];
    v.push("oh hai".into());
    v.push(format!("there, {}.", "Mark").into());

    println!("{:?}", v);
}

Produces:

["oh hai", "there, Mark."]

2 Comments

60 seconds too late :(
@ker: I lost power some 10 seconds after submitting; just barely made it. :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.