0

I'm trying to make a simple Haskell program that will take any line that looks like someFilenameHere0035.xml and returns 0035. My sample input file, input.txt, would look like this:

someFilenameHere0035.xml
anotherFilenameHere4465.xml

And running: cat input.txt | runhaskell getID.hs should return:

0035
4465

I'm having so much difficulty figuring this out. Here's what I have so far:

import Text.Regex.PCRE

getID :: String -> [String]
getID str = str =~ "([0-9]+)\\.xml" :: [String]

main :: IO ()
main = interact $ unlines . getID

But I get an error message I don't understand at all:

• No instance for (RegexContext Regex String [String])
 arising from a use of ‘=~’
• In the expression: str =~ "([0-9]+)\\.xml" :: [String]
   In an equation for ‘getID’:
   getID str = str =~ "([0-9]+)\\.xml" :: [String] (haskell-stack-ghc)

I feel like I'm really close, but I don't know where to go from here. What am I doing wrong?

1
  • 1
    If this is for learning Haskell: great! Otherwise it seems like you might just want to throw the standard command line tools at this. grep -o '[0-9]\+\.xml' | sed 's/.xml//' seems to work, and you could probably do it with a single sed command if you don't mind it looking a bit less comprehensible. Commented Sep 17, 2017 at 2:26

1 Answer 1

1

First off you only want the number part so we can get rid of the \\.xml.

The regex-pcre library defines an instance for RegexContext Regex String String but not RegexContext Regex String [String] hence the error.

So if we change the type signature to String -> String then that error is taken care of.

unlines expects [String] so to test what we had at this point I wrote a quick function that wraps its argument in a list (there's probably a nicer way to do that but that's not the point of the question):

toList :: a -> [a]
toList a = [a]

Running your command with main = interact $ unlines . toList . getID output 0035, so we're almost there.

getID is passed a String of the file contents, these are conveniently separated by the \n character. So we can use splitOn "\n" from the Data.List.Split library to get our list of .xml files.

Then we simply need to map getID over that list (toList is no longer needed).

This gives us:

import Text.Regex.PCRE
import Data.List.Split

getID :: String -> String
getID str = str =~ "([0-9]+)"

main :: IO ()
main = interact $ unlines . map getID . splitOn "\n"

This gives me the desired output when I run your command.

Hopefully this helps :)

Sign up to request clarification or add additional context in comments.

2 Comments

The original regex may have been deliberately chosen, e.g. if they were expecting one possible filename to be foo3bar4.xml.
@DanielWagner If that were the case then the example input.txt file should have included such a case. I just provided something that gave the desired output for the example input.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.