Ad hoc polymorphism in Python
Some statically typed languages have ad hoc polymorphism where a function can have multiple implementations depending on the types of its arguments. In languages like C++ and Java, it is called function overloading. In Haskell, it is accomplished with type classes.
Consider an example of writing a to_json
function in C++. The function
takes a single value and returns a string, but it must be implemented
differently for each different type of value:
std::string to_json(int i);
std::string to_json(double d);
std::string to_json(std::string s);
Some implementations may be "recursive", and call the implementation for another type:
template <typename T>
std::string to_json(std::vector<T> const& xs) {
...
for (T const& x : xs) {
... to_json(x) ...
}
...
}
Many dynamically typed languages, like Python and JavaScript, lack ad hoc polymorphism in the language, but developers can implement it by hand by inspecting the argument types and dispatching to implementations accordingly:
def to_json(value):
if isinstance(value, int):
return str(int)
if isinstance(value, float):
return str(int)
if isinstance(value, str):
return f'"{value}"'
if isinstance(value, list):
return '[' + ','.join(to_json(x) for x in value) + ']'
In addition to being a little uglier, this technique suffers from a limitation: once we've defined the function, we can't add any more overloads. Imagine we want to define a JSON serialization for our user-defined type:
from ... import to_json
@dataclass
class Person:
name: str
def to_json_person(person):
return f'{{"name":{to_json(person.name)}}}'
While this example works for serializing Person
, we won't be able to
serialize a list
of Person
because the implementation of to_json
for list
won't call to_json_person
.
Type Classes
In most languages with ad hoc polymorphism, two functions with the same name but different types are called overloads of the name. In Haskell, these overloads are not permitted: no two functions (or any other values for that matter) can have the same name in the same scope. However, type classes offer a way around this limitation.
A type class in Haskell is a group of polymorphic functions, called methods, parameterized by a single type variable. The type class only needs to declare the method signatures; it does not need to provide any definitions.
-- Declare a type class `Json` with type variable `t`.
class Json t where
toJson :: t -> String
An instance for a type class defines all the methods of the type class for a specific type argument in the place of the type variable. In other words, a type class has exactly one polymorphic declaration, but many monomorphic instances, one for every possible type argument. Thus, a method can have many definitions (i.e. implementations), one from each instance, which means it can be overloaded.
-- Define implementations of type class `Json`
-- for type arguments `Bool`, `Integer`, and `[]` (list).
instance Json Bool where
toJson :: Bool -> String
toJson value = if value then "true" else "false"
instance Json Integer where
toJson :: Integer -> String
toJson value = show value
instance (Json t) => Json [t] where
toJson :: (Json t) => [t] -> String
toJson values = "[" ++ (intercalate "," $ map toJson values) ++ "]"
At a method call site, how does Haskell know which overload, from which instance, to use? Haskell requires that the signature of the method in the type class declaration mentions the type variable in one of its parameters or its return type. It tries to unify that polymorphic declaration signature with the call site to fill in the type variable; if it succeeds, then it selects the monomorphic instance for that type argument.
Python
How can we replicate type classes in Python? First, let us dream up an ideal API.
API
We can decorate a method signature with a call to typeclass
, giving it
a type variable. The decorator will check the signature to make sure
that the type variable appears at least once in the type annotations of the
parameters. Unlike Haskell, Python cannot infer the return type at a call
site, so that path to instance discovery is impossible; the type variable
must be used as the type of at least one parameter.
T = typing.TypeVar('T')
@typeclass(T)
def to_json(value: T) -> str:
"""Serialize a value to JSON."""
We may optionally provide a default implementation. If we do not, the
default behavior is to raise a NotImplementedError
diagnosing
a missing instance for the specific type variable.
The typeclass
decorator will add an instance
attribute to the method.
Use that to decorate monomorphic implementations, giving it the type argument:
@to_json.instance(str)
def _to_json_str(s):
return f'"{s}"'
We can decorate an implementation multiple times if it can serve multiple instances:
@to_json.instance(int)
@to_json.instance(float)
def _to_json_number(n):
return str(n)
We can define an implementation for all types structurally matching a protocol. Because it is presently impossible to infer the difference between a protocol and a type, we must differentiate it for the decorator:
@to_json.instance(typing.Iterable, protocol=True)
def _to_json_iterable(xs):
return '[' + ','.join(to_json(x) for x in xs) + ']'
If a type argument matches multiple protocols, the instance that was first defined will be chosen.
Now we can define instances for types whether we defined the type or imported it.
@to_json.instance(Person)
def _to_json_person(person):
return f'{{"name":{to_json(person.name)}}}'
>>> to_json([Person(name='John')])
[{"name":"John"}]
Implementation
How is this implemented? Effectively, each method is given two mappings in which to hold instances:
- A mapping from types to instances. If the type argument is a key in this mapping, then we use its instance.
- A mapping from protocols to instances. If the type argument was not found in the first mapping, then this mapping is iterated in insertion order. We use the instance of the first protocol for which the type argument is a structural subtype.
If no known instance matches, then the default implementation is used.
I've published a library to PyPI called typeclasses
that delivers this
API.
It even has batteries included for a few type classes / methods:
to_json
(definition, examples): Serialize values to JSON.fmap
(definition, examples): Map functions over values inside of functors (recursively).afmap
(definition, examples):fmap
for asynchronous functions.binhash
andhexhash
(definition, examples): Content hashing for values.
Please try it out, and let me know what you think!