9

In Dart, I would like to split a string using a regular expression and include the matching delimiters in the resulting list. So with the delimiter ., I want the string 123.456.789 to get split into [ 123, ., 456, ., 789 ].

In some languages, like C#, JavaScript, Python and Perl, according to https://stackoverflow.com/a/15668433, this can be done by simply including the delimiters in capturing parentheses. The behaviour seems to be documented at https://ecma-international.org/ecma-262/9.0/#sec-regexp.prototype-@@split.

This doesn't seem to work in Dart, however:

print("123.456.789".split(new RegExp(r"(\.)")));

yields exactly the same thing as without the parentheses. Is there a way to get split() to work like this in Dart? Otherwise I guess it will have to be an allMatches() implementation.

Edit: Putting ((?<=\.)|(?=\.)) for the regex apparently does the job for a single delimiter, with lookbehind and lookahead. I will actually have a bunch of delimiters, and I'm not sure about efficiency with this method. Can someone advise if it's fine? Legibility is certainly reduced: to allow delimiters . and ;, would one need ((?<=\.)|(?=\.)|(?<=;)(?=;)) or ((?<=\.|;)|(?=\.|;). Testing

print("123.456.789;abc;.xyz.;ABC".split(new RegExp(r"((?<=\.|;)|(?=\.|;))")));

indicates that both work.

8
  • 1
    Split on (?!^|$)\b Commented Dec 31, 2019 at 17:37
  • The delimiter isn't always going to be . - it could be one of a bunch of expressions. Commented Dec 31, 2019 at 17:56
  • 1
    that's fine, I didn't specify ., it'll split on word boundary locations Commented Dec 31, 2019 at 17:56
  • What's expected from 123.456.789;abc;.xyz.;ABC? Commented Dec 31, 2019 at 18:00
  • You need to write a custom method for it, String.split does not allow this in Dart. Commented Dec 31, 2019 at 18:29

2 Answers 2

11

There is no direct support for it in the standard library, but it is fairly straightforward to roll your own implementation based on RegExp.allMatches(). For example:

extension RegExpExtension on RegExp {
  List<String> allMatchesWithSep(String input, [int start = 0]) {
    var result = <String>[];
    for (var match in allMatches(input, start)) {
      result.add(input.substring(start, match.start));
      result.add(match[0]!);
      start = match.end;
    }
    result.add(input.substring(start));
    return result;
  }
}

extension StringExtension on String {
  List<String> splitWithDelim(RegExp pattern) =>
      pattern.allMatchesWithSep(this);
}

void main() {
  print("123.456.789".splitWithDelim(RegExp(r"\.")));
  print(RegExp(r" ").allMatchesWithSep("lorem ipsum dolor sit amet"));
}
Sign up to request clarification or add additional context in comments.

5 Comments

Excellent - I didn't know about extensions. This fits the bill well. One might want to check for empty strings in some places, for example when adding in the final part of input, but that depends on the application.
Thanks. (The following is obvious for who know regexps) If, for instance, you have different possible separators, like '.' and ':', you need to use a regexp like '[\.:]', etc.
Amazing, thank you very much, saved me from a real headache.
As mentioned in the question how can we use multiple delimiters with this method ?
Multiple delimiters can simply be encoded in a regular expression using character classes (for single character delimiters) or alternatives, e.g. [,;] or ,|;|\.\.
1

Splitting on single delimiter

Given your initial string:

123.456.789

And expected results (split on and including delimiters):

[123, ., 456, ., 789]

You can come up with the following regex:

(?!^|$)\b

Matches locations that match a word boundary, except for the start/end of the line.


Splitting on multiple delimiters

Now for your edit, given the following string:

123.456.789;abc;.xyz.;ABC

You'd like the expected results (split on and including multiple delimiters):

[123, ., 456, ., 789, ;, abc, ;, ., xyz, ., ;, ABC]

You can use the following regex (adapted from first - added alternation):

See regex sample here (I simulate split by using substitution with newline character for display purposes).

Either of the following work.

(?!^|$)\b|(?!\w)\B(?!\w)
(?!^|$)\b|(?=\W)\B(?=\W)

# the long way (with case-insensitive matching) - allows underscore _ as delimiter
(?!^|$)(?:(?<=[a-z\d])(?![a-z\d])|(?<![a-z\d])(?=[a-z\d])|(?<![a-z\d])(?![a-z\d]))

Matches locations that match a word boundary, except for the start/end of the line; or matches a location that doesn't match a word boundary, but is preceded by or followed by a non-word character.

Note: This will work in Dart 2.3.0 and up since lookbehind support was added (see here for more info).

1 Comment

I wanted to allow to split by any regular expression (determined by the user); the example with a . was just an example. It's not clear to me whether this allows for that. The look{ahead|behind} code I posted in the edit works for that, but the performance isn't clear to me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.