6

I have an &str type string from which I want to remove a sub-string. I have an algorithm to calculate the start and end positions of the part to be removed. How can I now remove the sub-string?

To illustrate this clearer, if I were using C++, I would do this:

#include<iostream>
#include<string>

    int main(){
        std::string foo = "Hello";
        int start = 2,stop = 4;
        std::cout<<foo;
        foo.erase(start, stop - start);
        std::cout<<std::endl<<foo<<std::endl;
    }

My code in Rust:

fn main(){
    let mut foo: &str = "hello";
    let start: i32 = 0;
    let stop: i32 = 4;
    //what goes here?
}
     
1
  • 1
    You may need to align the range inclusive part but this is the idea: (&foo[..start]).to_string() + &foo[end..] Commented Jun 21, 2020 at 11:32

3 Answers 3

11

&str is an immutable slice, it somewhat similar to std::string_view, so you cannot modify it. Instead, you may use iterator and collect a new String:

let removed: String = foo
    .chars()
    .take(start)
    .chain(foo.chars().skip(stop))
    .collect();

the other way would be an in-place String modifying:

let mut foo: String = "hello".to_string();

// ...

foo.replace_range((start..stop), "");

Keep in mind, however, that the last example semantically different, because it operates on byte indicies, rather than char ones. Therefore it may panic at wrong usage (e.g. when start offset lay at the middle of multi-byte char).

Sign up to request clarification or add additional context in comments.

4 Comments

The replace_range way is likely faster. Also note that with chars the indexes are unlikely-to-be-useful codepoint indexes, while with replace_range they are byte-indexes, so the two examples are not completely equivalent.
Thanks, extended with a notice.
replace_range was exactly what I was looking for. Thanks so much! Could you clarify on this one more thing? As I understand, replace_range will only work for ASCII characters (those that have a fixed 1 byte size), and not with emojis in something like Unicode. Is this the only implication of using replace_range as opposed to chars?
String is an utf-8 string, which still can be used as a byte array, while preserving an invariant (valid utf-8 string). So you might get a panic quite easily: "я".to_string().replace_range((1..), ""), though this works: "я".to_string().replace_range((0..), ""). Therefore you can use replace_range if you know the proper char bounds.
1

A solution that uses the character indexes of the beginning and the end of the substring to be removed with splice():

fn remove(start: usize, stop: usize, s: &str) -> String {
    let mut v: Vec<char> = s.chars().collect();
    v.splice(start..stop, vec![]);
    v.iter().collect()
}

Playground


Kitsu's solution w/o lambda
fn remove(start: usize, stop: usize, s: &str) -> String {
    let mut rslt = "".to_string();
    for (i, c) in s.chars().enumerate() {
        if start > i || stop < i + 1 {
            rslt.push(c);
        }
    }
    rslt
}

…as fast as replace_range but can handle unicode character w/o character boundary calculations

Playground

7 Comments

I don't think it's as fast as replace_range() (did you benchmark?), and replace_range() can certainly handle Unicode.
@Friedman Apart from the fact that you first have to figure out the byte indices of these Unicode characters, then replace_range() works with Unicode.
Even then, I'm not sure this will be faster than finding the indices then using replace_range().
@Friedman There is no even then but only has to be always. And since there is no try_replace_range(), your program will panic, if you made a mistake in finding the byte indices so that replace_range() can work with Unicode characters. This finding takes additional time, that must be taken into account.
Finding the byte indices is as simple as let mut c = s.char_indices(); let start = c.nth(start).unwrap().0; let end = c.nth(end - start).unwrap().0;. And I know this will take some time, but I still think this + replace_range() will be faster than your approach.
|
0

If you prefer a slightly more performant way compared to remove_range that works on unicode you can use this:

fn remove_range(text: &str, start: usize, end: usize) -> String {
    let start = text.floor_char_boundary(start);
    let end = text.ceil_char_boundary(end);
    [&text[..start], &text[end..]].concat()
}

Just a note, since working with variable range characters can be troublesome, in this solution its uses an experimental feature #![feature(round_char_boundary)] to allow for "safe" string manipulation, the results might be different from expected if using complex multi-char arrangements, but it will not panic.

3 Comments

if not using unicode, just remove the xxx_char_boundary. it will work fine.
Why do you think it's more performant than replace_range()? I think the opposite.
i did not post any profiling because it "depends", but i would advise to profile for your specific architecture and instruction set, for me, yes it is slightly faster. the worst case scenario is the same performance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.