166

I have a string and I need to scan for every occurrence of "foo" and read all the text following it until a second ". Since Rust does not have a contains function for strings, I need to iterate by characters scanning for it. How would I do this?

Edit: Rust's &str has a contains() and find() method.

4
  • 2
    Could you show an example of some inputs with your desired outputs? It will help us see more clearly what you are trying to accomplish exactly. Commented Mar 1, 2014 at 18:21
  • there's an example of such loop here rustbyexample.org/loops.html, although I think there're easier means for that. Commented Mar 1, 2014 at 21:34
  • Note that there are many string search algorithms, and their time complexity is not that of a straightforward approach (O(n*m)). en.wikipedia.org/wiki/String_searching_algorithm Commented Mar 3, 2014 at 23:11
  • This sounds like regex. Commented Oct 9, 2015 at 22:54

3 Answers 3

252

I need to iterate by characters scanning for it.

The .chars() method returns an iterator over characters in a string. e.g.

for c in my_str.chars() { 
    // do something with `c`
}

for (i, c) in my_str.chars().enumerate() {
    // do something with character `c` and index `i`
}

If you are interested in the byte offsets of each char, you can use char_indices.

Look into .peekable(), and use peek() for looking ahead. It's wrapped like this because it supports UTF-8 codepoints instead of being a simple vector of characters.

You could also create a vector of chars and work on it from there, but that's more time and space intensive:

let my_chars: Vec<_> = mystr.chars().collect();
Sign up to request clarification or add additional context in comments.

3 Comments

Beware that the characters obtained this way might not correspond to an intuitive definition of a character as perceived by humans. See github.com/unicode-rs/unicode-segmentation for more details.
Does this destroy/consume the string?
@FreelanceConsultant: No, as the signature of the method shows, it takes an immutable reference to self, not self itself or a mutable reference to it. It iterates the underlying data directly, without changing it in any way.
38

The concept of a "character" is very ambiguous and can mean many different things depending on the type of data you are working with. The most obvious answer is the chars method. However, this does not work as advertised. What looks like a single "character" to you may actually be made up of multiple Unicode code points, which can lead to unexpected results:

"a̐".chars() // => ['a', '\u{310}']

For a lot of string processing, you want to work with graphemes. A grapheme consists of one or more unicode code points represented as a string slice. These map better to the human perception of "characters". To create an iterator of graphemes, you can use the unicode-segmentation crate:

use unicode_segmentation::UnicodeSegmentation;

for grapheme in my_str.graphemes(true) {
    // ...
}

If you are working with raw ASCII then none of the above applies to you, and you can simply use the bytes iterator:

for byte in my_str.bytes() {
    // ...
}

Although, if you are working with ASCII then arguably you shouldn't be using String/&str at all and instead use Vec<u8>/&[u8] directly.

2 Comments

Interesting last sentence statement. Any pointers to why String is not advised for simple ASCII? I'm learning Rust.
@DawidLaszuk A String cannot be indexed directly, and has to perform extra utf8 checks to be manipulated. It's easier, and less overhead to work with bytes directly.
6
fn main() {
let s = "Rust is a programming language";
for i in s.chars() {
    print!("{}", i);
}}

Output: Rust is a programming language

I use the chars() method to iterate over each element of the string.

1 Comment

Sorry, but what does this add to the accepted answer?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.