designing data type with lots of constructor in haskell

Question

is there a alternative way to write something like this:

data Message = Message1 Int Int ByteString
             | Message2 Double Int Int
             | Message3 Double Double
             .....
             | Message256 CustomType

There's way too many constructors, and it's difficult to use record syntax. What I really want to do is to write a parser, is there some alternative approaches for this?

parse :: Bytestring -> Parser Message

Assuming that the types for each message do not have any deeper structure -- i.e. they are listed in a specification somewhere -- you will need to need to encode that scheme somewhere in the program, which will be a lot of constructions. The best way to do this depends mostly on how you intend to consume the Message type -- i.e. who uses it and how? Hard to give advice without more context. — luqui
– luqui, Commented Jul 24, 2013 at 21:12
Thanks for the help. I'm trying to re-implement Interactive Broker's TWS API as an exercise to learn Haskell. There are about 40 different messages and they are read from a socket. Depending on the message type and some internal state, the messages maybe 1) ignored. 2) modify internal state. 3) some further processing, say dump into a sqlite database (depending on the message type, they may end up in different tables). I'm considering using one of io-stream/conduit/pipes to separate parsing part and processing part. The parsers are easy, but something feels wrong about the message type. — Kai
– Kai, Commented Jul 24, 2013 at 21:39
If there are 40 different messages, then 40 constructors might be the way to go. — augustss
– augustss, Commented Jul 24, 2013 at 22:17
How do I use record syntax in this case? GHC won't let me compile if two or more constructors contains a field with same name (eg. a lot of the message types contains a 'requestId' field) — Kai
– Kai, Commented Jul 24, 2013 at 22:34
You can have the same field name in multiple constructors if they all have the same type. — augustss
– augustss, Commented Jul 25, 2013 at 0:46

nh2 · Accepted Answer · 2013-09-08 08:14:26Z

3

First off, there is a Plan to implement overloaded record fields for Haskell, which would allow to use the same name in different records, and you would only have to explicitly specify which one you want in the cases where the compiler can't figure it out by itself.

That being said ...

I found the most reliable and convenient way to deal with this is one Haskell type per message type.

You would have:

data Message1 = Message1 Int Int ByteString -- can use records here
data Message2 = Message2 Double Int Int
data Message3 = Message3 { m3_a :: Double, m3_b :: Double }
--          .....
data Message256 = Message256 CustomType

-- A sum type over all possible message types:
data AnyMessage = M1   Message1
                | M2   Message2
                | M3   Message3
                -- ...
                | M256 Message256

Benefits of this include:

You can use records (still have to use different prefixes, but that is often fine enough)

It is much safer than sharing records across constructors:

data T = A { field :: Int }
       | B { field :: Int }
       | C { bla :: Double } -- no field record

print (field (C 2.3)) -- will crash at runtime, no compiler warning

You can now write functions that only work on certain message types.
You can now write functions that only work on a subset (e.g. 3 of them) of message types: all you need is another sum type.

Code dealing with this is still quite elegant:

process :: AnyMessage -> IO ()
process anyMsg = case anyMsg of
    M1 (Message1 x y bs) -> ...
    ...
    M3 Message3{ m3_a, m3_b } -> ... -- using NamedFieldPuns

I have used this pattern multiple times in production, and it leads to very robust code.

answered Sep 8, 2013 at 8:14

nh2

26k12 gold badges90 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AleXoundOS Over a year ago

What I personally dislike here is duplicating the id of message, "M_3_ Message_3_...". It's all okay, except the redundant syntax. Ironically, using union types in C offers more consice code. Is there any way to do the same without duplication of the message id in pattern matching or in the ADT declaration itself?

nh2 Over a year ago

@AleXoundOS I'm not convinced that C is more concice. In the above, you always have a tag (e.g.M3) and a payload (e.g. Message3). In C you'd need the same: The payloads in the union, and a tag (e.g. enum or an integer). You could use TemplateHaskell to remove what you consider boilerplate, but I personally don't think it is a good idea, as it is good to have tag and bodies super clear. Also remember that the names M3/Message3 are examples; you could as well have | DeletionRequest UserId, in which case it doesn't look like redundant syntax at all.

Collectives™ on Stack Overflow

designing data type with lots of constructor in haskell

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related