169

I am attempting to index a string in Rust, but the compiler throws an error. My code (Project Euler problem 4, playground):

fn is_palindrome(num: u64) -> bool {
    let num_string = num.to_string();
    let num_length = num_string.len();

    for i in 0 .. num_length / 2 {
        if num_string[i] != num_string[(num_length - 1) - i] {
            return false;
        }
    }
    
    true
}

The error:

error[E0277]: the trait bound `std::string::String: std::ops::Index<usize>` is not satisfied
 --> <anon>:7:12
  |
7 |         if num_string[i] != num_string[(num_length - 1) - i] {
  |            ^^^^^^^^^^^^^
  |
  = note: the type `std::string::String` cannot be indexed by `usize`

Is there a reason why String can not be indexed? How can I access the data then?

1

10 Answers 10

217

Yes, indexing into a string is not available in Rust. The reason for this is that Rust strings are saved in a contiguous UTF-8 encoded buffer internally, so the concept of indexing itself would be ambiguous, and people would misuse it: byte indexing is fast, but almost always incorrect (when your text contains non-ASCII symbols, byte indexing may leave you inside a character / unicode code point, which is really bad if you need text processing), while code point indexing is not free because UTF-8 is a variable-length encoding, so you have to traverse the entire string buffer to find the required code point.

If you are certain that your strings contain ASCII characters only, you can use the as_bytes() method on &str which returns a byte slice, and then index into this slice:

let num_string = num.to_string();

// ...

let b: u8 = num_string.as_bytes()[i];
let c: char = b as char;  // if you need to get the character as a unicode code point

If you do need to index code points, you have to use the chars() iterator:

num_string.chars().nth(i).unwrap()

As I said above, this would require traversing the entire iterator up to the ith code element.

Finally, in many cases of text processing, it is actually necessary to work with grapheme clusters rather than with code points or bytes. For example, many emojis are composed of multiple code points, but are perceived as one "character". With the help of the unicode-segmentation crate, you can index into grapheme clusters as well:

use unicode_segmentation::UnicodeSegmentation

let string: String = ...;
UnicodeSegmentation::graphemes(&string, true).nth(i).unwrap()

Naturally, grapheme cluster indexing into the contiguous UTF-8 buffer has the same requirement of traversing the entire string as indexing into code points.

Sign up to request clarification or add additional context in comments.

5 Comments

FWIW, String could never be indexed. The indexing removal was only for &str.
I think nowadays, char_at() was also removed... (rustc 1.23.0-nightly (79cfce3d3 2017-11-12))
Be aware that chars().nth(i) is an iterator, so the operation would be O(n) and not O(1) as with vec indexing.
I'm a bit confused by this snippet which do index on String : google.github.io/comprehensive-rust/types-and-values/…
@yota It allows getting a substring/slice by byte indexes. Note that it will still panic if the start or end index is inside a code point to guarantee the returned string is also all valid UTF-8. It seems this was a practical decision because you need an (efficient/byte-based) substring method; maybe substring_unchecked() would have been clearer? The missing normal indexing still prevents the wrong but common ASCII-only assumption and has no good use in itself: use chars() or as_bytes() for all ASCII. See users.rust-lang.org/t/why-string-can-be-sliced-with-usize-index/…
54

The correct approach to doing this sort of thing in Rust is not indexing but iteration. The main problem here is that Rust's strings are encoded in UTF-8, a variable-length encoding for Unicode characters. Being variable in length, the memory position of the nth character can't determined without looking at the string. This also means that accessing the nth character has a runtime of O(n)!

In this special case, you can iterate over the bytes, because your string is known to only contain the characters 0–9 (iterating over the characters is the more general solution but is a little less efficient).

Here is some idiomatic code to achieve this (playground):

fn is_palindrome(num: u64) -> bool {
    let num_string = num.to_string();
    let half = num_string.len() / 2;

    num_string.bytes().take(half).eq(num_string.bytes().rev().take(half))
}

We go through the bytes in the string both forwards (num_string.bytes().take(half)) and backwards (num_string.bytes().rev().take(half)) simultaneously; the .take(half) part is there to halve the amount of work done. We then simply compare one iterator to the other one to ensure at each step that the nth and nth last bytes are equivalent; if they are, it returns true; if not, false.

2 Comments

FWIW, String has a direct as_bytes. Furthermore, you can use std::iter::order::equals. rather than the all: equals(iter.take(n), iter.rev().take(n)).
BTW, convention implies importing std::iter::order and calling order::equals(..., ...) (I only didn't do this in my comment because it would've been noisy).
40

If what you are looking for is something similar to an index, you can use

.chars() and .nth() on a string.


.chars() -> Returns an iterator over the chars of a string slice.

.nth() -> Returns the nth element of the iterator, in an Option


Now you can use the above in several ways, for example:

let s: String = String::from("abc");
//If you are sure
println!("{}", s.chars().nth(x).unwrap());
//or if not
println!("{}", s.chars().nth(x).expect("message"));

2 Comments

It is important to note that Chars::nth(n) consumes n characters, rather than just being plain indexing. As stated by the documentation calling nth(0) multiple times on the same iterator will return different elements.
If you are indeed not sure whether the Nth character exists, using expect() versus unwrap() will not prevent a panic. The code will panic regardless, but expect will provide a custom panic message. See also: stackoverflow.com/questions/61301581/…
27

You can convert a String or &str to a vec of a chars and then index that vec.

For example:

fn main() {
    let s = "Hello world!";
    let my_vec: Vec<char> = s.chars().collect();
    println!("my_vec[0]: {}", my_vec[0]);
    println!("my_vec[1]: {}", my_vec[1]);
}

Here you have a live example

1 Comment

How about the performance? I think the string bytes are copied.
4

Indexing on String is not allowed because (please check the book):

  • it is not clear what the indexed value should be: a byte, a character, or a grapheme cluster (which we call a letter in common sense)
  • strings are vectors of bytes (u8) encoded with UTF-8 and UTF-8 is a variable length encoding, i.e. every character can take different number of bytes - from 1 to 4. So to get a character or grapheme cluster by index would require a whole string traversal (O(n) in average and the worst cases) from the beginning to determine valid bytes bounds of the character or the grapheme.

So if you input doesn't contain diacritics (considered as a separate character) and it's ok to approximate letter with character, you can use chars() iterator and DoubleEndedIterator trait for two pointers approach:

    fn is_palindrome(num: u64) -> bool {
        let s = num.to_string();
        let mut iterator = s.chars();
        loop  {
            let ch = iterator.next();
            let ch_end = iterator.next_back();
            
            if ch.is_none() || ch_end.is_none() {
                break;
            }
            if ch.unwrap() != ch_end.unwrap() {
                return false
            }
        }
        true
    }

Comments

2

Indexing on strings is possible, just not with single-valued scalars. Range objects work on String and &str. A "single-valued" range object (one with length one) is valid as well. Playground link

fn main() {
    let str1 = "lorem ipsum";
    let string2 = String::from(str1);
    println!("{}:{} {}:{}", &str1[..1], &str1[1..5], &string2[6..7], &string2[7..]);
}

Comments

1

this is not suitable for all uses by any means, but if you just need to reference the previous character (or, with a little rework, the next character), then it's possible to do so without iterating through the entire str.

the scenario here is that there is a str slice, string, and pattern was found in the slice. i want to know the character immediately before the pattern.

call prev_char like prev_char(string.as_bytes(), pattern_index) where pattern index is the index of the first byte of pattern in string.

utf-8 encoding is well defined and this works just by backing up until it finds one of the starting bytes (either high order bit 0 or bits 11) and then converting that 1-4 byte [u8] slice to a str.

this code just unwraps it because the pattern was found in a valid utf-8 str to begin with, so no error is possible. if your data has not been validated it might be best to return a result rather than an Option.

enum PrevCharStates {
    Start,
    InEncoding,
}

fn prev_char(bytes: &[u8], starting_index: usize) -> Option<&str> {
    let mut ix = starting_index;
    let mut state = PrevCharStates::Start;

    while ix > 0 {
        ix -= 1;
        let byte = bytes[ix];
        match state {
            PrevCharStates::Start => {
                if byte & 0b10000000 == 0 {
                    return Some(std::str::from_utf8(&bytes[ix..starting_index]).unwrap());
                } else if byte & 0b11000000 == 0b10000000 {
                    state = PrevCharStates::InEncoding;
                }
            },
            PrevCharStates::InEncoding => {
                if byte & 0b11000000 == 0b11000000 {
                    return Some(std::str::from_utf8(&bytes[ix..starting_index]).unwrap());
                } else if byte & 0b11000000 != 0b10000000 {
                    return None;
                }
            }
        }
    }
    None
}

2 Comments

This function can be written, with a slightly different signature, as string[..index].chars().next_back() (playground)
thanks. i'm pretty new to rust and seem to learn something new every day.
1

The bellow code works fine, not sure about performance and O complexity and hopefully someone can add more information about this solution.

fn is_palindrome(num: u64) -> bool {
    let num_string = String::from(num.to_string());
    let num_length = num_string.len();
    for i in 0..num_length / 2 {
        let left = &num_string[i..i + 1];
        let right = &num_string[((num_length - 1) - i)..num_length - i];
        if left != right {
            return false;
        }
    }
    true
}

1 Comment

I think [i..i] directly slices into the string as bytes. to test, try ¹⁄₂₂⁄¹ I think it gives false, because you code reverses inside unicode code points
1

There are two reasons indexing is not working in Rust:

  • In rust, strings are stored as a collection of utf-8 encoded bytes. In memory, strings are just collections of 1's and 0's. a program needs to be able to interpret those 1's and 0's and print out the correct characters. that's where encoding comes into play.

       fn main(){
           let sample:String=String::from("2bytesPerChar")
           // we could this in higher programming languages. in rust we get error. cannot be indexed by an integer
           let c:char=sample[0]
       }
    

String is a collection of bytes. so what is the lenght of our "2bytesPerChar". Because some chars can be 1 to 4 bytes long. Assume that first character has 2 bytes. If you want to get the first char in string, using the indexing, hello[0] will specify the first byte which is the only half of the first string.

  • Another reason is there are 3 relevant ways a word in represented in unicode: Bytes, scalar values, grapheme clusters. If we use indexing rust does not know what we will receive. Bytes, scalar value or grapheme clusters. so we have to use more specific methods.

How to access the characters in String

  • Return bytes

       for b in "dsfsd".bytes(){
           // bytes method returns a collection of bytes and here we are iterating over every byte and printing it out
           println!("{}",b)
       }
    
  • Return scalar values:

   // we could iterate over scalar values using char methods
   for c in "kjdskj".chars(){
       println!("{}",c)
   }
  • return grapheme values:

In order to keep rust standard library lean, the ability iterate over graphene clusters is not included by default. we need to import a crate

// in cargo.toml
   [dependencies]
   unicode-segmentation="1.7.1"

then:

   use unicode_segmentation::UnicodeSegmentation;
   // we pass true to get extended grapheme clusters
   for g in "dada"graphemes(true){
       println!("{}",g)
   }

Comments

0

Disclaimer: this answer uses satire. Don't read it if you don't like satire.


Here's how, without the pedantry, where s is your string and i is your index:

(s.as_bytes()[i] as char) // Rust deems this safe! Is it, really? Good question.

Naturally, this only works as expected for ASCII strings, since you may already know that UTF-8 is backwards-compatible with ASCII. How do you know if you're dealing with ASCII? Use your brain1. It's not simple at all, it's the sort of thing that makes you want to bang your head into a thousand desks, but you can get through it with a little courage, a little persistence, and maybe some drink to take the edge off. Rust makes it safer, but we don't live in a world where all intersections are roundabouts.

If you're curious, here's what it looks like to get this wrong. Spoiler:

Nobody dies.

fn print_string(s: &str) {
    for i in 0..s.len() {
        print!("{}", s.as_bytes()[i] as char);
    }

    // Alternatively...
    //  for c in s.as_bytes() {
    //      print!("{}", *c as char);
    //  }

    println!();
}

fn main() {
    print_string("🐶"); // Uh oh.
}

Take it for a spin


1 Otherwise, please consult the Am I a Computer Program or a Laterally Thinking Being? handbook that was provided when you took the programmer's oath.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.