How can I define algebraic data types in Python (2 or 3)?
-
1Python has a very loose type system in the first place; what exactly are you trying to get out of this?Amber– Amber2013-04-28 01:18:14 +00:00Commented Apr 28, 2013 at 1:18
-
12@Amber not loose. very strong, but duck.Elazar– Elazar2013-04-28 01:19:59 +00:00Commented Apr 28, 2013 at 1:19
-
6@Elazar When I say "loose", I mean things like functions not having particular type signatures. But you're right, Python is not weakly typed.Amber– Amber2013-04-28 01:25:46 +00:00Commented Apr 28, 2013 at 1:25
-
3@Amber I see what you mean.Elazar– Elazar2013-04-28 01:28:35 +00:00Commented Apr 28, 2013 at 1:28
-
4@Amber Rather than "loose" maybe saying it's dynamically typed rather than statically typed would have been clearer.Dwayne Crooks– Dwayne Crooks2017-01-08 16:23:23 +00:00Commented Jan 8, 2017 at 16:23
5 Answers
Python 3.10 version
Here is a Python 3.10 version of Brent's answer with pattern-matching and prettier union type syntax:
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
@dataclass
class Circle:
x: float
y: float
r: float
@dataclass
class Rectangle:
x: float
y: float
w: float
h: float
Shape = Point | Circle | Rectangle
def print_shape(shape: Shape):
match shape:
case Point(x, y):
print(f"Point {x} {y}")
case Circle(x, y, r):
print(f"Circle {x} {y} {r}")
case Rectangle(x, y, w, h):
print(f"Rectangle {x} {y} {w} {h}")
print_shape(Point(1, 2))
print_shape(Circle(3, 5, 7))
print_shape(Rectangle(11, 13, 17, 19))
print_shape(4) # mypy type error
You can even do recursive types:
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class Branch:
value: int
left: Tree
right: Tree
Tree = Branch | None
def contains(tree: Tree, value: int):
match tree:
case None:
return False
case Branch(x, left, right):
return x == value or contains(left, value) or contains(right, value)
tree = Branch(1, Branch(2, None, None), Branch(3, None, Branch(4, None, None)))
assert contains(tree, 1)
assert contains(tree, 2)
assert contains(tree, 3)
assert contains(tree, 4)
assert not contains(tree, 5)
Note the need for from __future__ import annotations in order to annotate with a type that hasn't been defined yet.
Exhaustiveness checking for ADTs can be enforced with mypy using typing.assert_never() in Python 3.11+ or as part of the typing-extensions backport for older versions of Python.
def print_shape(shape: Shape):
match shape:
case Point(x, y):
print(f"Point {x} {y}")
case Circle(x, y, r):
print(f"Circle {x} {y} {r}")
case _ as unreachable:
# mypy will throw a type checking error
# because Rectangle is not covered in the match.
assert_never(unreachable)
6 Comments
contains function will use one stack frame per tree level. Given the logarithmic height of binary trees, this should be fine, just something to be aware ofcontains branches twice, so only contains(right, value) is in tail position, while the prior one, contains(left, value) is not. The or operation follows it in any case and no tail can be eliminated (the locals are still required after returning from it). So this always uses a stack frame per level, regardless if Python supported TCO.The typing module provides Union which, dissimilar to C, is a sum type. You'll need to use mypy to do static type checking, and there's a notable lack of pattern matching, but combined with tuples (product types), that's the two common algebraic types.
from dataclasses import dataclass
from typing import Union
@dataclass
class Point:
x: float
y: float
@dataclass
class Circle:
x: float
y: float
r: float
@dataclass
class Rectangle:
x: float
y: float
w: float
h: float
Shape = Union[Point, Circle, Rectangle]
def print_shape(shape: Shape):
if isinstance(shape, Point):
print(f"Point {shape.x} {shape.y}")
elif isinstance(shape, Circle):
print(f"Circle {shape.x} {shape.y} {shape.r}")
elif isinstance(shape, Rectangle):
print(f"Rectangle {shape.x} {shape.y} {shape.w} {shape.h}")
print_shape(Point(1, 2))
print_shape(Circle(3, 5, 7))
print_shape(Rectangle(11, 13, 17, 19))
# print_shape(4) # mypy type error
5 Comments
assert_never idiom: github.com/python/typing/issues/735None | None, for example, is isomorphic (if not identical) to None itself. None + None (to use the obvious syntax for a hypothetical sum type) would be isomorphic to bool.there is a library that does exactly what you want called ADT, it even has mypy typechecking. However it is unmaintained so use it with caution ig.
the code using it looks like this
@adt
class Tree:
EMPTY: Case
LEAF: Case[int]
NODE: Case["Tree", "Tree"]
and matching works too
# Defined in some other module, perhaps
def some_operation() -> Either[Exception, int]:
return Either.RIGHT(22) # Example of building a constructor
# Run some_operation, and handle the success or failure
default_value = 5
unpacked_result = some_operation().match(
# In this case, we're going to ignore any exception we receive
left=lambda ex: default_value,
right=lambda result: result)
Comments
With the introduction of structural pattern matching in Python 3.10, ADTs in Python became practical
from typing import Union
from dataclasses import dataclass
@dataclass
class Circle:
radius: float
@dataclass
class Rectangle:
width: float
height: float
@dataclass
class Triangle:
base: float
height: float
Shape = Circle | Rectangle | Triangle
def area(shape: Shape) -> float:
match shape:
case Circle(radius):
return 3.14159 * radius ** 2
case Rectangle(width, height):
return width * height
case Triangle(base, height):
return 0.5 * base * height
# Example usage
circle = Circle(radius=5)
print("Area of circle:", area(circle))
rectangle = Rectangle(height=10, width=5)
print("Area of rectangle:", area(rectangle))
triangle = Triangle(base=10, height=5)
print("Area of triangle:", area(triangle))
I would argue @dataclass is how Python supports Product types. And in line 18th, you'll see Python already supports Union Types. That enables us to create a Shape relationship (Type) without resorting to inheritance.
The big problem before 3.10 was the lack of Pattern Matching, which meant discriminating between the concrete shape required chained if-elif-else/instanceOf chains.
With pattern matching, we can express the cases directly and cleanly. To my surprise, mypy will give you an error if the pattern matching is not exhaustive.
I am disappointed that Python implemented match as a statement, not an expression, but I am guessing that an expression might have looked weird to long-time Pythonistas.
The above shows Python's support for ADTs is adequate, given the limitations of a dynamically typed imperative language.
If interested in a longer explanation, you could look at my blog
Comments
Here's an implementation of sum types in relatively Pythonic way.
import attr
@attr.s(frozen=True)
class CombineMode(object):
kind = attr.ib(type=str)
params = attr.ib(factory=list)
def match(self, expected_kind, f):
if self.kind == expected_kind:
return f(*self.params)
else:
return None
@classmethod
def join(cls):
return cls("join")
@classmethod
def select(cls, column: str):
return cls("select", params=[column])
Crack open an interpreter and you'll see familiar behavior:
>>> CombineMode.join()
CombineMode(kind='join_by_entity', params=[])
>>> CombineMode.select('a') == CombineMode.select('b')
False
>>> CombineMode.select('a') == CombineMode.select('a')
True
>>> CombineMode.select('foo').match('select', print)
foo
Note: The @attr.s decorator comes from the attrs library, it implements __init__, __repr__, and __eq__, but it also freezes the object. I included it because it cuts down on the implementation size, but it's also widely available and quite stable.
Sum types are sometimes called tagged unions. Here I used the kind member to implement the tag. Additional per-variant parameters are implemented via a list. In true Pythonic fashion, this is duck-typed on the input & output sides but not strictly enforced internally.
I also included a match function that does basic pattern matching. Type safety is also implemented via duck typing, a TypeError will be raised if the passed lambda's function signature doesn't align with the actual variant you're trying to match on.
These sum types can be combined with product types (list or tuple) and still retain a lot of the critical functionality required for algebraic data types.
Problems
This doesn't strictly constrain the set of variants.