I was messing around with the split() method in Java when I came across a problem which I couldn't seem to understand. I was curious as to where exactly the split method starts to search for regex matches: at the first character, before, or after?
Given String "test":
If the split method starts before the first character then there should be an empty string before the string "test", and splitting at an empty string should return an array of length 6, but it is of length 5.
System.out.println("test".split("",-1).length);
So clearly the split method does not start before the given string.
If the split method starts at the first character given string then shouldn't splitting with a regex of "Z*" return an array of length 6 with a leading empty string as the first character is indeed not Z (hence 0 or more times)? However it returns an array of length 5.
System.out.println("test".split("Z*",-1).length);
So by induction the split method starts after the first character... but clearly it does not since the following code works as expected:
System.out.println("test".split("t",-1).length);
Output: 3
So where exactly does the split method start searching for regex matches? Or what exactly is the gap in my reasoning?
Number of matches + 1'test' has 2 t's,tgive's 3. Test has 4 characters, matching nothing gives 5. Is that what you got ?splitmethod was optimised so that a single-character pattern which is not a regex special character will not actually engage the regex engine. So splitting on just the character "t" will not cause regex to be engaged.Z*will match nothing as well, equivalent to "" if no Z's in the sample. However, if you useZ+on a string without Z's you should get an array of 1 element, the original string.