3

Is there a way to split a string based on several separators while keeping some of the separators in the splitted array? So if I have the string "This is a-weird string,right?" I would like to get

["This", "is", "a", "-", "weird", "string", ",", "right", "?"]

I have tried using string.split(/([^a-zA-Z])/g), but I don't want to keep the whitespace. This guide seems like being something I can use, but my understanding of regex is not good enough to know how to mix those two.

0

3 Answers 3

4

You can use

console.log("This is a-weird string,right?".match(/[^\W_]+|[^\w\s]|_/g))

The regex matches:

  • [^\W_]+ - one or more alphanumeric chars
  • | - or
  • [^\w\s] - any char other than word and whitespace
  • | - or
  • _ - an underscore.

See the regex demo.

A fully Unicode aware regex will be

console.log("This is ą-węird string,right?".match(/[\p{L}\p{M}\p{N}]+|[\p{P}\p{S}]/gu))

Here,

  • [\p{L}\p{M}\p{N}]+ - one or more Unicode letters, diacritics or digits
  • | - or
  • [\p{P}\p{S}] - a single punctuation proper or symbol char.

See this regex demo.

Sign up to request clarification or add additional context in comments.

Comments

2

Here is a regex splitting approach. We can try splitting on the following pattern:

\s+|(?<=\w)(?=\W)|(?<=\W)(?=\w)

Code snippet:

var input = "This is a-weird string,right?";
var parts = input.split(/\s+|(?<=\w)(?=\W)|(?<=\W)(?=\w)/);
console.log(parts);

Here is an explanation of the regex pattern used, which says to split on:

\s+            whitespace
|              OR
(?<=\w)(?=\W)  the boundary between a word character preceding and non word
               character following
|              OR
(?<=\W)(?=\w)  the boundary between a non word character preceding and word
               character following

Comments

1

Try like this:

const str = "This is a-weird string,right?";

var arr = str.replace(/(\S)([\,\-])/g, "$1 $2").replace(/([\,\-])(\S)/g, "$1 $2").split(" ");

console.log(arr);

You can replace using each delimiter you're interested in so that it has a space on each side, then use that to split and return an array.

2 Comments

This solves the case, but if there is a space after the , or -, an empty string is created in the array [... "string", ",", "''", "right?"] if the string was "... string, right?".
That's true! Although I've made a correction to that and this option might be slightly easier to understand with less regex experience, in hindsight, an all regex solution is simply more efficient. I think you chose the best answer :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.