202

Is there a standard way to split a string in Haskell?

lines and words work great from splitting on a space or newline, but surely there is a standard way to split on a comma?

I couldn't find it on Hoogle.

To be specific, I'm looking for something where split "," "my,comma,separated,list" returns ["my","comma","separated","list"].

1
  • 28
    I would really like to such a function in a future release of Data.List or even Prelude. It's so common and nasty if not available for code-golf. Commented Feb 12, 2011 at 15:08

15 Answers 15

198

Remember that you can look up the definition of Prelude functions!

http://www.haskell.org/onlinereport/standard-prelude.html

Looking there, the definition of words is,

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                      "" -> []
                      s' -> w : words s''
                            where (w, s'') = break Char.isSpace s'

So, change it for a function that takes a predicate:

wordsWhen     :: (Char -> Bool) -> String -> [String]
wordsWhen p s =  case dropWhile p s of
                      "" -> []
                      s' -> w : wordsWhen p s''
                            where (w, s'') = break p s'

Then call it with whatever predicate you want!

main = print $ wordsWhen (==',') "break,this,string,at,commas"
Sign up to request clarification or add additional context in comments.

Comments

161

There is a package for this called split.

cabal install split

Use it like this:

ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]

It comes with a lot of other functions for splitting on matching delimiters or having several delimiters.

7 Comments

Cool. I wasn't aware of this package. This is the ultimate split package as it gives much control over the operation (trim space in results, leave separators in result, remove consecutive separators, etc...). There are so many ways of splitting lists, it is not possible to have in single split function that will answer every needs, you really need that kind of package.
otherwise if external packages are acceptable, MissingH also provides a split function: hackage.haskell.org/packages/archive/MissingH/1.2.0.0/doc/html/… That package also provides plenty of other "nice-to-have" functions and I find that quite some packages depend on it.
The split package is now apart of the haskell platform as of most recent release.
import Data.List.Split (splitOn) and go to town. splitOn :: Eq a => [a] -> [a] -> [[a]]
@RussAbbott the split package is included in the Haskell Platform when you download it (haskell.org/platform/contents.html), but it is not automatically loaded when building your project. Add split to the build-depends list in your cabal file, e.g. if your project is called hello, then in the hello.cabal file below the executable hello line put a line like ` build-depends: base, split` (note two space indent). Then build using the cabal build command. Cf. haskell.org/cabal/users-guide/…
|
47

If you use Data.Text, there is splitOn:

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html#v:splitOn

This is built in the Haskell Platform.

So for instance:

import qualified Data.Text as T
main = print $ T.splitOn (T.pack " ") (T.pack "this is a test")

or:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T
main = print $ T.splitOn " " "this is a test"

2 Comments

@RussAbbott probably you need to a dependency to the text package or install it. Would belong in another question though.
Couldn't match type ‘T.Text’ with ‘Char’ Expected type: [Char] Actual type: [T.Text]
23

Use Data.List.Split, which uses split:

[me@localhost]$ ghci
Prelude> import Data.List.Split
Prelude Data.List.Split> let l = splitOn "," "1,2,3,4"
Prelude Data.List.Split> :t l
l :: [[Char]]
Prelude Data.List.Split> l
["1","2","3","4"]
Prelude Data.List.Split> let { convert :: [String] -> [Integer]; convert = map read }
Prelude Data.List.Split> let l2 = convert l
Prelude Data.List.Split> :t l2
l2 :: [Integer]
Prelude Data.List.Split> l2
[1,2,3,4]

1 Comment

Note that this is part of the split package, which must be installed: stackoverflow.com/a/34175246/388951
20

Without importing anything a straight substitution of one character for a space, the target separator for words is a space. Something like:

words [if c == ',' then ' ' else c|c <- "my,comma,separated,list"]

or

words let f ',' = ' '; f c = c in map f "my,comma,separated,list"

You can make this into a function with parameters. You can eliminate the parameter character-to-match my matching many, like in:

 [if elem c ";,.:-+@!$#?" then ' ' else c|c <-"my,comma;separated!list"]

3 Comments

That does not distinguish between new added spaces and spaces that were here originally, so for "my,comma separated,list" it will see 4 parts instead of 3 as intended.
@Yuri Kovalenko words does; try words [if c == ',' then ' ' else c|c <- "my, comma, separated, list "]
Yuri Kovalenko The question was a comma separated string. Are you referring to another question?
19

In the module Text.Regex (part of the Haskell Platform), there is a function:

splitRegex :: Regex -> String -> [String]

which splits a string based on a regular expression. The API can be found at Hackage.

2 Comments

Could not find module ‘Text.Regex’ Perhaps you meant Text.Read (from base-4.10.1.0)
It may be in the module regex-compat-tdfa (but I'm a haskell newb)
14

Try this one:

import Data.List (unfoldr)

separateBy :: Eq a => a -> [a] -> [[a]]
separateBy chr = unfoldr sep where
  sep [] = Nothing
  sep l  = Just . fmap (drop 1) . break (== chr) $ l

Only works for a single char, but should be easily extendable.

Comments

13
split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s

E.g.

split ';' "a;bb;ccc;;d"
> ["a","bb","ccc","","d"]

A single trailing delimiter will be dropped:

split ';' "a;bb;ccc;;d;"
> ["a","bb","ccc","","d"]

Comments

9

I find this simpler to understand:

split :: Char -> String -> [String]
split c xs = case break (==c) xs of 
  (ls, "") -> [ls]
  (ls, x:rs) -> ls : split c rs

1 Comment

...simpler than what? Which kind of answers is your solution better of? Background: There is already some other answers.
6

I started learning Haskell yesterday, so correct me if I'm wrong but:

split :: Eq a => a -> [a] -> [[a]]
split x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if y==x then 
            func x ys ([]:(z:zs)) 
        else 
            func x ys ((y:z):zs)

gives:

*Main> split ' ' "this is a test"
["this","is","a","test"]

or maybe you wanted

*Main> splitWithStr  " and " "this and is and a and test"
["this","is","a","test"]

which would be:

splitWithStr :: Eq a => [a] -> [a] -> [[a]]
splitWithStr x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if (take (length x) (y:ys)) == x then
            func x (drop (length x) (y:ys)) ([]:(z:zs))
        else
            func x ys ((y:z):zs)

2 Comments

I was looking for a built-in split, being spoiled by languages with well-developed libraries. But thanks anyway.
You wrote this in June, so I assume you've moved on in your journey :) As an exercise, trying rewriting this function without reverse or length as use of these functions incur an algorithmic complexity penalty and also prevent application to an infinite list. Have fun!
5

I don’t know how to add a comment onto Steve’s answer, but I would like to recommend the
  GHC libraries documentation,
and in there specifically the
  Sublist functions in Data.List

Which is much better as a reference, than just reading the plain Haskell report.

Generically, a fold with a rule on when to create a new sublist to feed, should solve it too.

Comments

4

Example in the ghci:

>  import qualified Text.Regex as R
>  R.splitRegex (R.mkRegex "x") "2x3x777"
>  ["2","3","777"]

5 Comments

Please, don’t use regular expressions to split strings. Thank you.
@kirelagin, why this comment? I'm learning Haskell, and I'd like to know the rational behind your comment.
@Andrey, is there a reason why I cannot even run the first line in my ghci?
@EnricoMariaDeAngelis Regular expressions are a powerful tool for string matching. It makes sense to use them when you are matching something non-trivial. If you just want to split a string on something as trivial as another fixed string, there is absolutely no need to use regular expressions – it will only make the code more complex and, likely, slower.
"Please, don’t use regular expressions to split strings." WTF, why not??? Splitting a string with a regular expression is a perfectly reasonable thing to do. There are lots of trivial cases where a string needs to be split but the delimiter isn't always exactly the same.
3

In addition to the efficient and pre-built functions given in answers I'll add my own which are simply part of my repertory of Haskell functions I was writing to learn the language on my own time:

-- Correct but inefficient implementation
wordsBy :: String -> Char -> [String]
wordsBy s c = reverse (go s []) where
    go s' ws = case (dropWhile (\c' -> c' == c) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> c' /= c) rem)) ((takeWhile (\c' -> c' /= c) rem) : ws)

-- Breaks up by predicate function to allow for more complex conditions (\c -> c == ',' || c == ';')
wordsByF :: String -> (Char -> Bool) -> [String]
wordsByF s f = reverse (go s []) where
    go s' ws = case ((dropWhile (\c' -> f c')) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> (f c') == False)) rem) (((takeWhile (\c' -> (f c') == False)) rem) : ws)

The solutions are at least tail-recursive so they won't incur a stack overflow.

Comments

0

I am far late but would like to add it here for those interested, if you're looking for a simple solution without relying on any bloated packages:

split :: String -> String -> [String]
split _ "" = []
split delim str =
  split' "" str []
  where
    dl = length delim

    split' :: String -> String -> [String] -> [String]
    split' h t f
      | dl > length t = f ++ [h ++ t]
      | delim == take dl t = split' "" (drop dl t) (f ++ [h])
      | otherwise = split' (h ++ take 1 t) (drop 1 t) f

3 Comments

Oh come on... Ultimately what matters is not that something is liked by thousands of people. I am NOT forcing you to use it. It's ONLY there for those interested. Sounds like you're none of them.
You say "liked by" -- I say "battle tested". It's fine if you enjoy sharing it. My question was for the standard way to do it, and that has been answersd.
Haskell does not come with the split function out of the box. Remember you asked a function that splits a string by a string (String -> String -> [String]), not by a char (Char->String->[String]). You have to install the split package, which is NOT a standard way EITHER. Installing the split package will also include a bunch of redundant functions. You only asked for a split function, and I gave exactly that to you and NO MORE.
0

So many answers, but I don't like them all. I don't know Haskell actually, but I wrote much shorter and (as I think) cleaner version for 5 minutes;

splitString :: Char -> [Char] -> [[Char]]
splitString _ [] = []
splitString sep str = 
    let (left, right) = break (==sep) str 
    in left : splitString sep (drop 1 right)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.