20

I have a comma delaminated string that when calling String.split(",") it returns an array size of about 60. In a specific use case I only need to get the value of the second value that would be returned from the array. So for example "Q,BAC,233,sdf,sdf," all I want is the value of the string after the first ',' and before the second ','. The question I have for performance am I better off parsing it myself using substring or using the split method and then get the second value in the array? Any input would be appreciated. This method will get called hundreds of times a second so it's important I understand the best approach regarding performance and memory allocation.

-Duncan

3
  • 6
    Profile it. Period. Commented Dec 21, 2012 at 21:23
  • The problem with my profiler is that I don't know where all the char[] arrays I see in the profiler come from. What is the use case to profile this and get metrics? Commented Dec 21, 2012 at 21:25
  • 1
    I didn't mean profile the number of char arrays created. Profile the time it takes to execute a somewhat realistic benchmark, before and after the "optimization", to see if it makes any noticeable difference. Commented Dec 21, 2012 at 21:27

5 Answers 5

43

Since String.Split returns a string[], using a 60-way Split would result in about sixty needless allocations per line. Split goes through your entire string, and creates sixty new object plus the array object itself. Of these sixty one objects you keep exactly one, and let garbage collector deal with the remaining sixty.

If you are calling this in a tight loop, a substring would definitely be more efficient: it goes through the portion of your string up to the second comma ,, and then creates one new object that you keep.

String s = "quick,brown,fox,jumps,over,the,lazy,dog";
int from = s.indexOf(',');
int to = s.indexOf(',', from+1);
String brown = s.substring(from+1, to);

The above prints brown

When you run this multiple times, the substring wins on time hands down: 1,000,000 iterations of split take 3.36s, while 1,000,000 iterations of substring take only 0.05s. And that's with only eight components in the string! The difference for sixty components would be even more drastic.

Sign up to request clarification or add additional context in comments.

3 Comments

That is, of course, assuming it's actually performance critical and you couldn't achieve more speed up in other ways. Programmers have a tendency to make wildly inaccurate guesses about that.
I appreciate all the answers. Seems like the theme is substring and you explained it the best.
I wrote a method to retrieve a token at a desired index, pastebin.com/R9Z6uW6H
4

ofcourse why iterate through whole string, just use substring() and indexOf()

1 Comment

The possibility of off-by-one errors? The increased amount of less obvious code?
3

You are certainly better off doing it by hand for two reasons:

  • .split() takes a string as an argument, but this string is interpreted as a Pattern, and for your use case Pattern is costly;
  • as you say, you only need the second element: the algorithm to grab that second element is simple enough to do by hand.

Comments

2

I would use something like:

final int first = searchString.indexOf(",");
final int second = searchString.indexOf(",", first+1);
String result= searchString.substring(first+1, second);

3 Comments

Thanks, what is the purpose of declaring the index values as final?
It's just a code convention I am used to. I make all variables final which are only assigned once.
Thank I you upvoted! Please upvote my question I want to break 600!
1

My first inclination would be to find the index of the first and second commas and take the substring.

The only real way to tell for sure, though, is to test each in your particular scenario. Break out the appropriate stopwatch and measure the two.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.