2

I am trying to split a string in clojure "Hello|World" but when use the split method "(clojure.string/split x #"|")" I get a weird result, I get this "[h e l l o | w o r l d]". Can anyone tell me why it does this and how can I split it up to get [hello world]?

3
  • Use (clojure.string/split "Hello|World" (re-pattern (. java.util.regex.Pattern quote "|"))) which 1) invokes Pattern.quote to create a Pattern from the string "|", then 2) uses re-pattern to create a regular expression from the quoted string, which is then passed as the second argument to clojure.string/split, which then produces the desired result ["Hello" "World"]. If you want to make this a bit prettier use (defn re-quoted-pattern [s] (re-pattern (. java.util.regex.Pattern quote s))), and your code then becomes (clojure.string/split "Hello|World" (re-quoted-pattern "|")). Commented May 25, 2018 at 15:24
  • @WiktorStribiżew - if you could please remove your close vote on this I could post the comment above as an answer. You may be correct that from a Java point of view the question is a dup, but this question is not tagged for java, and from a Clojure point of view no one has addressed the issue of how to invoke Pattern.quote from Clojure; thus, I believe your close-as-duplicate should be undone. Thanks. Commented May 25, 2018 at 15:36
  • This is a question tagged with regex. The | symbol is a well-known char that requires escaping if one wants to treat is as a literal char. No need to reopen. Commented May 25, 2018 at 16:37

1 Answer 1

8

Here is the answer:

(str/split "Hello|World" #"|")  => ["H" "e" "l" "l" "o" "|" "W" "o" "r" "l" "d"]
(str/split "Hello World" #" ")  => ["Hello" "World"]
(str/split "Hello|World" #"\|") => ["Hello" "World"]

In a regular expression, the | character is special, and needs to be escaped with a backslash \.

The | character is a logical operator in regex and is normally used to mean "or", like "abc|def":

(str/split "Hello|World" #"e|o") => ["H" "ll" "|W" "rld"]

Since you had nothing else present it seems to have been interpreted as "anything OR anything", so it matched the boundary between each character.

See the Java docs for more information.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.