How can I use this regex in Haskell?

Question

I'm trying to make a simple Haskell program that will take any line that looks like someFilenameHere0035.xml and returns 0035. My sample input file, input.txt, would look like this:

someFilenameHere0035.xml
anotherFilenameHere4465.xml

And running: cat input.txt | runhaskell getID.hs should return:

0035
4465

I'm having so much difficulty figuring this out. Here's what I have so far:

import Text.Regex.PCRE

getID :: String -> [String]
getID str = str =~ "([0-9]+)\\.xml" :: [String]

main :: IO ()
main = interact $ unlines . getID

But I get an error message I don't understand at all:

• No instance for (RegexContext Regex String [String])
 arising from a use of ‘=~’
• In the expression: str =~ "([0-9]+)\\.xml" :: [String]
   In an equation for ‘getID’:
   getID str = str =~ "([0-9]+)\\.xml" :: [String] (haskell-stack-ghc)

I feel like I'm really close, but I don't know where to go from here. What am I doing wrong?

If this is for learning Haskell: great! Otherwise it seems like you might just want to throw the standard command line tools at this. grep -o '[0-9]\+\.xml' | sed 's/.xml//' seems to work, and you could probably do it with a single sed command if you don't mind it looking a bit less comprehensible. — Daniel Wagner
– Daniel Wagner, Commented Sep 17, 2017 at 2:26

James Burton · Accepted Answer · 2017-09-17 00:59:06Z

1

First off you only want the number part so we can get rid of the \\.xml.

The regex-pcre library defines an instance for RegexContext Regex String String but not RegexContext Regex String [String] hence the error.

So if we change the type signature to String -> String then that error is taken care of.

unlines expects [String] so to test what we had at this point I wrote a quick function that wraps its argument in a list (there's probably a nicer way to do that but that's not the point of the question):

toList :: a -> [a]
toList a = [a]

Running your command with main = interact $ unlines . toList . getID output 0035, so we're almost there.

getID is passed a String of the file contents, these are conveniently separated by the \n character. So we can use splitOn "\n" from the Data.List.Split library to get our list of .xml files.

Then we simply need to map getID over that list (toList is no longer needed).

This gives us:

import Text.Regex.PCRE
import Data.List.Split

getID :: String -> String
getID str = str =~ "([0-9]+)"

main :: IO ()
main = interact $ unlines . map getID . splitOn "\n"

This gives me the desired output when I run your command.

Hopefully this helps :)

answered Sep 17, 2017 at 0:59

James Burton

8044 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Daniel Wagner Over a year ago

The original regex may have been deliberately chosen, e.g. if they were expecting one possible filename to be foo3bar4.xml.

James Burton Over a year ago

@DanielWagner If that were the case then the example input.txt file should have included such a case. I just provided something that gave the desired output for the example input.

Collectives™ on Stack Overflow

How can I use this regex in Haskell?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related