41

I'd like to split a string using the Split function in the Regex class. The problem is that it removes the delimiters and I'd like to keep them. Preferably as separate elements in the splitee.

According to other discussions that I've found, there are only inconvenient ways to achieve that.

Any suggestions?

7
  • 6
    Input string? your regex? expected output? Commented Mar 27, 2013 at 19:41
  • 5
    @AndreasJohansson: to the contrary, there was sample code to be posted. You wrote the problem is that it removes... What is "it" in this situation? This is a classic question of "I can get this output, but I'd like to get this output"- a great kind of question, but one made much easier to answer if the original code (that gives close to, but not exactly, the desired output) is shown. Commented Mar 28, 2013 at 23:19
  • 2
    This question has triggered a discussion on Meta. Commented Mar 28, 2013 at 23:23
  • 3
    @AndreasJohansson - Don't repost. edit. If there's a problem with your post, reposting it may lead to an automatic question ban. Instead, I think the people here are just simply asking you to post an example of the code that doesn't work, so that it can help them tailor a solution for you that builds on what you already know instead of guessing what you know and then have you come back with a comment saying "No, that's not what I meant.". Remember, people here are volunteering their time to help you, so it's wise to help them by posting what they ask for. Hope this helps! :) Commented Mar 29, 2013 at 0:30
  • 1
    @jmort253 I really tried to reformulate the question but I could find any way to do that without actually damaging the question that I was asking. I'm really sorry. I'm going to disregard this question in whole because it's caused way to much attention. Please don't take that as I'm ignoring you. I'm just cutting off the infected thread. Commented Mar 29, 2013 at 1:02

6 Answers 6

101

Just put the pattern into a capture-group, and the matches will also be included in the result.

string[] result = Regex.Split("123.456.789", @"(\.)");

Result:

{ "123", ".", "456", ".", "789" }

This also works for many other languages:

  • JavaScript: "123.456.789".split(/(\.)/g)
  • Python: re.split(r"(\.)", "123.456.789")
  • Perl: split(/(\.)/g, "123.456.789")

(Not Java though)

Sign up to request clarification or add additional context in comments.

5 Comments

Oh, this was even better! Funny example - you match any by a period that actually is a period. +1 for a great syntax! However, for some reason it doesn't catch the last element so I get just what you said but except for the 789 part.
While reading look ahead, I read that it's not included in result like: Regex.Match ("say 25 miles more", @"\d+\s(?=miles)"); //OUTPUT: 25 and another statement states that to include the separator while splitting wrap the pattern in positive look ahead like: Regex.Split ("oneTwoThree", @"(?=[A-Z])"); // OUTPUT one Two Three confused
@sortednoun The look-ahead matches zero characters, only if the body would match from that position. The look-ahead body is not part of the match, so there is nothing extra to include. The text matched by the body would instead be included in the next array item, when splitting. (?=([A-Z])) would both create an extra item with the letter AND include it in the next item.
Is it safe to say that odd index item in result sequernce must be the delimiters?
@Mr.Squirrel.Downy If there is exactly one capture group, yes.
8

Use Matches to find the separators in the string, then get the values and the separators.

Example:

string input = "asdf,asdf;asdf.asdf,asdf,asdf";

var values = new List<string>();
int pos = 0;
foreach (Match m in Regex.Matches(input, "[,.;]")) {
  values.Add(input.Substring(pos, m.Index - pos));
  values.Add(m.Value);
  pos = m.Index + m.Length;
}
values.Add(input.Substring(pos));

Comments

4

Say that input is "abc1defg2hi3jkl" and regex is to pick out digits.

String input = "abc1defg2hi3jkl";
var parts = Regex.Matches(input, @"\d+|\D+")
            .Cast<Match>()
            .Select(m => m.Value)
            .ToList();

Parts would be: abc 1 defg 2 hi 3 jkl

Comments

1

For Java:

Arrays.stream("123.456.789".split("(?<=\\.)|(?=\\.)+"))
                .forEach((p) -> {
                    System.out.println(p);
                });

outputs:

123
.
456
.
789

inspired from this post (How to split string but keep delimiters in java?)

Comments

0

Add them back:

    string[] Parts = "A,B,C,D,E".Split(',');
    string[] Parts2 = new string[Parts.Length * 2 - 1];
    for (int i = 0; i < Parts.Length; i++)
    {
        Parts2[i * 2] = Parts[i];
        if (i < Parts.Length - 1)
            Parts2[i * 2 + 1] = ",";
    }

2 Comments

But that doesn't work in the case that the regex has more than one possible match.
What do you do if you don't know what delimiter's been used? Can you repeat the example to us Regex class?
0

for c#: Split paragraph to sentance keeping the delimiters sentance is splited by . or ? or ! followed by one space (otherwise if there any mail id in sentance it will be splitted)

string data="first. second! third? ";
Regex delimiter = new Regex("(?<=[.?!] )"); //there is a space between ] and )
string[] afterRegex=delimiter.Split(data);

Result

first. second! third?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.