4

I've gone through the String's split method documentation but the results are not as expected. When we split a string with the limit argument set to a negative value it always append an empty value. Why should it do that? Consider some cases

// Case 1
String str = "1#2#3#";
System.out.println(str.split("#").length); // Prints 3
System.out.println(str.split("#", -1).length); // Prints 4

What i would expect here is both prints 3.

// Case 2
str = "";
System.out.println(str.split("#").length); // Prints 1
System.out.println(str.split("#", -1).length); // Prints 1

Now since no match is found the usual split method without limit was supposed to print 0 but it creats an array with an empty string.

// Case 3
str = "#";
System.out.println(str.split("#").length); // Prints 0
System.out.println(str.split("#", -1).length); // Prints 2

Now i have a match and the split method without limit argument works fine. Its is my expected output but why wouldnt it create an empty array in this case as in case 2?

// Case 4
str = "###";
System.out.println(str.split("#").length); // Prints 0
System.out.println(str.split("#", -1).length); // Prints 4

Here first split method is as expected but why does the second one gives 4 instead of 3?

// Case 5
str = "1#2#3#";
System.out.println(str.split("#", 0).length); // Prints 3
System.out.println(str.split("#", 3).length); // Prints 3
System.out.println(str.split("#", 4).length); // Prints 4

Now the last case with positive limit. If the positive amount is <= the number of match the result is as expected. But if we give a higher positive limit it again appends an empty string to the resulting array.

3 Answers 3

6

From the JavaDoc for String

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Emphasis mine.

In the negative limit case empty matches are not discarded so, if I represent empty with E:

1#2#3# -> 1 # 2 # 3 # E
E      -> E
#      -> E # E
###    -> E # E # E # E

In your last example (with a positive limit), empty trailing space is only discarded if n == 0.

Sign up to request clarification or add additional context in comments.

3 Comments

If empty trailing are only discarded if n==0 then in case 5 when i gave n as 3 why didn't it include the empty trailing. But when i gave n as 4 it included the empty trailing.
and the array's last entry will contain all input beyond the last matched delimiter You are failing to see the empty string as a string. When you put 4, it matches 3 times, then appends the empty string. When you put 3, it matches twice, then puts the rest (which happens to be another match)
@SyamS in case 5 there are exactly 3 matches available to consume, so 0 and 3 give the same result, at 3 is the limit of the number of results. 4 matches the available 3 then appends the remaining string to the output - the empty string.
3

The main source of confustion comes from the often missed section of the doc:

... If n is zero then ..., and trailing empty strings will be discarded.

Once you get that everything makes sense.

1 Comment

@SyamS - That is covered elsewhere ... If the expression does not match any part of the input then the resulting array has just one element, namely this string.
2

From the documentation

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

It appears the negative limit behavior is predefined as maximize matches, and store anything else at the end.

2 Comments

Agreed but consider case 3 the pattern can be applied 3 times and why would it add an additional empty string to make the array length 4?
there are 4 strings(all empty) in case 4. (before)#(middle1)#(middle2)#(after) The array builds the 3 matches (empty string 3 times) then "Everything after the last match" (another empty string)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.