8

Given the string s, and the index i which is where the character starts:

let s = "abc 好 def";
let i = 4;

What's the best way to get the index after that character, so that I can slice the string and get abc 好? In code:

let end = find_end(s, i);
assert_eq!("abc 好", &s[0..end]);

(Note, + 1 doesn't work because that assumes that the character is only 1 byte long.)

I currently have the following:

fn find_end(s: &str, i: usize) -> usize {
    i + s[i..].chars().next().unwrap().len_utf8()
}

But I'm wondering if I'm missing something and there's a better way?

2 Answers 2

7

You could use char_indices to get the next index rather than using len_utf8 on the character, though that has a special case for the last character.

I would use the handy str::is_char_boundary() method. Here's an implementation using that:

fn find_end(s: &str, i: usize) -> usize {
    assert!(i < s.len());
    let mut end = i+1;
    while !s.is_char_boundary(end) {
        end += 1;
    }
    end
}

Playground link

Normally I would make such a function return Option<usize> in case it's called with an index at the end of s, but for now I've just asserted.

In many cases, instead of explicitly calling find_end it may make sense to iterate using char_indices, which gives you each index along with the characters; though it's slightly annoying if you want to know the end of the current character.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the answer! I didn't include it in the question, but I also had is_char_boundary at some point. With char_indices, if you stop at a char and want to get the next index, you can use i + c.len_utf8(), so that's a good idea too!
1

To serve as a complement to @ChrisEmerson's answer, this is how one could implement a find_end that searches for the end of a character's first occurrence. Playground

fn find_end<'s>(s: &'s str, p: char) -> Option<usize> {
    let mut indices = s.char_indices();
    let mut found = false;
    for (_, v) in &mut indices {
        if v == p {
            found = true;
            break;
        }
    }
    if found {
        Some(indices.next()
                    .map_or_else(|| s.len(), |(i, _)| i))
    } else {
        None
    }
}

Although it avoids the byte boundary loop, it is still not very elegant. Ideally, an iterator method for traversing until a predicate is met would simplify this.

4 Comments

I'm a bit surprised there doesn't seem to be a next_char_boundary method!
Thanks as well! The downside with using next() is that it only works if there is another character after the current one.
@ChrisEmerson Yes, a next_char_boundary method would be perfect.
@robinst Characters at the end can be addressed easily. I've updated the function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.