10

What is the correct way how to find a substring if I need to start not from 0?

I have this code:

fn SplitFile(reader: BufReader<File>) {
  for line in reader.lines() {
    let mut l = line.unwrap();
    // l contains "06:31:53.012   index0:2015-01-06 00:00:13.084
    ...

I need to find third : and parse the date behind it. Still no idea how to do it, because find doesn't have any param like begin - see https://doc.rust-lang.org/std/string/struct.String.html#method.find.

(I know I can use regex. I have it done, but I'd like to compare the performance - whether parsing by hand might the quicker than using regex.)

1
  • 3
    What kind of begin parameter are you thinking of? If you mean begin is an offset, then you'd just slice and then find s[begin..].find(...) Commented Jul 8, 2015 at 8:44

3 Answers 3

5

There is a lot simpler solution to this problem in my opinion, and that is to use a .splitn() method. This method splits a string by a given pattern at most n times. For example:

let s = "ab:bc:cd:de:ef".to_string();
println!("{:?}", s.splitn(3, ':').collect::<Vec<_>>());
// ^ prints ["ab", "bc", "cd:de:ef"]

In your case, you need to split the line into 4 parts separated by ':' and take the 4th one (indexed from 0):

// assuming the line is correctly formatted
let date = l.splitn(4, ':').nth(3).unwrap();

If you don't want to use unwrap (the line might not be correctly formatted):

if let Some(date) = l.splitn(4, ':').nth(3) {
    // parse the date and time
}
Sign up to request clarification or add additional context in comments.

2 Comments

Good call. I remembered that Rust had split() but for some reason I didn't think of splitn(). BTW, it might be conceptually cleaner to use l.splitn(4, ':').last() instead of using .nth(3).
That sounds good also. Already implemented the other solution, but I'll probably make some benchmarks including this solution as well.
4

You are right, there doesn't appear to be any trivial way of skipping several matches when searching a string. You can do it by hand though.

fn split_file(reader: BufReader<File>) {
    for line in reader.lines() {
        let mut l = &line.as_ref().unwrap()[..]; // get a slice
        for _ in 0..3 {
            if let Some(idx) = l.find(":") {
                l = &l[idx+1..]
            } else {
                panic!("the line didn't have enough colons"); // you probably shouldn't panic
            }
        }
        // l now contains the date
        ...

Update:

As faiface points out below, you can do this a bit cleaner with splitn():

fn split_file(reader: BufReader<File>) {
    for line in reader.lines() {
        let l = line.unwrap();
        if let Some(datetime) = l.splitn(4, ':').last() {
            // datetime now contains the timestamp string
            ...
        } else {
            panic!("line doesn't contain a timestamp");
        }
    }
}

You should go upvote his answer.

3 Comments

Thank you, I'll try that. Can you also tell me how performant is that? What does that l = &l[idx+1..] do? Does it create new slice on stack? Does it copy the appropriate bytes? I'm asking because I try to process large files and any such extra work might kill performance significantly.
Slices are references, they never represent copying nor do they represent any allocation of their own
@stej: measure, measure, measure :) In this specific case, it might well be that the bounds check l[idx+1..] will cost more than the the assignment itself. You can check how a slice is implemented in the std::raw module: just an integer and pointer.
2

Just the date and not also the time, right?

let test: String = "06:31:53.012   index0:2015-01-06 00:00:13.084".into();

let maybe_date = test.split_whitespace()
    .skip(1)
    .next()
    .and_then(|substring| substring.split(":").skip(1).next());

assert_eq!(maybe_date, Some("2015-01-06"));

2 Comments

I assumed the time was part of the date that stej wanted to parse. Together they represent a fully-specified timestamp.
It's true, I wanted to parse the time as well. This code might look good for small files and shows another way how to solve my problem, so I upvoted as well..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.