I am trying to split a string in clojure "Hello|World" but when use the split method "(clojure.string/split x #"|")" I get a weird result, I get this "[h e l l o | w o r l d]". Can anyone tell me why it does this and how can I split it up to get [hello world]?
1 Answer
Here is the answer:
(str/split "Hello|World" #"|") => ["H" "e" "l" "l" "o" "|" "W" "o" "r" "l" "d"]
(str/split "Hello World" #" ") => ["Hello" "World"]
(str/split "Hello|World" #"\|") => ["Hello" "World"]
In a regular expression, the | character is special, and needs to be escaped with a backslash \.
The | character is a logical operator in regex and is normally used to mean "or", like "abc|def":
(str/split "Hello|World" #"e|o") => ["H" "ll" "|W" "rld"]
Since you had nothing else present it seems to have been interpreted as "anything OR anything", so it matched the boundary between each character.
See the Java docs for more information.
(clojure.string/split "Hello|World" (re-pattern (. java.util.regex.Pattern quote "|")))which 1) invokesPattern.quoteto create a Pattern from the string "|", then 2) usesre-patternto create a regular expression from the quoted string, which is then passed as the second argument toclojure.string/split, which then produces the desired result["Hello" "World"]. If you want to make this a bit prettier use(defn re-quoted-pattern [s] (re-pattern (. java.util.regex.Pattern quote s))), and your code then becomes(clojure.string/split "Hello|World" (re-quoted-pattern "|")).Pattern.quotefrom Clojure; thus, I believe your close-as-duplicate should be undone. Thanks.regex. The|symbol is a well-known char that requires escaping if one wants to treat is as a literal char. No need to reopen.