module Tutorial.DataTypes.GenericDataTypes
import Tutorial.DataTypes.Enumerations
Sometimes, a concept is general enough that we'd like to apply it not just to one type, but to all kinds of types. For instance, it would be highly inconvenient if we had to define a notion of a list over and over again for every type we might want to put in a list, a list of integers and a list of strings may have differently typed contents, but they share the same structure, and repeating that structure would lead to lots of code duplication. Instead we'd like to have a single generic list type parameterized by the type of value it stores.
In this section, we will explore the definition and use of generic types.
Suppose we want to parse a value of the Weekday
type,
described in the section on enumerations, from user input. Such a
parsing function should return Saturday
if the user
provided string was "Saturday"
, but what should it return
if the input were, say, "sdfkl332"
? We have another of
options here, for instance, we could return another "default" value,
such as Sunday
, but we must consider what behavior
programmers using our library might expect. However, silently continuing
with such a default value in the face of invalid user input is very
rarely the best option, and can lead to a lot of confusion.
If we were working in a traditional imperative language, our parsing
function would probably throw an exception upon encountering such
invalid input. Idris has the option to throw an exception too, through
the idris_crash
function in the Prelude , but we need to abandon totality to do
this, our function would no longer return a value for every possible
input. This is a high price to pay for something as commonplace as a
parsing error.
In a language like Java, C#, or C++, our function might also
return some sort of null
value (leading to the dreaded
NullPointerException
if not handled properly in the
consuming code). Our solution will be conceptually similar, but instead
of silently returning a null
, we will make the possibility
of failure visible in the type. We'll define a custom data type, which
encapsulates the possibility of failure.
Defining new data types in Idris is very cheap, in terms of the amount of code needed, so this pattern is very often encountered as a way of increasing type safety. Here's what such a bespoke type might look like for our use-case, along with the associated parsing function:
data MaybeWeekday = WD Weekday | NoWeekday
total
readWeekday : String -> MaybeWeekday
readWeekday "Monday" = WD Monday
readWeekday "Tuesday" = WD Tuesday
readWeekday "Wednesday" = WD Wednesday
readWeekday "Thursday" = WD Thursday
readWeekday "Friday" = WD Friday
readWeekday "Saturday" = WD Saturday
readWeekday "Sunday" = WD Sunday
readWeekday _ = NoWeekday
But what if we'd like also like to read a Bool
from user
input? We'd have to write another custom data type,
MaybeBool
, and so on for all the types we'd like to be able
to parse from a String
.
Idris, like many other programming languages, allows us to generalize
this behavior by using generic data types . Here's what the
generic version of our MaybeWeekday
might look
like:
While the prelude calls this type Maybe
, we are calling
it Option
to avoid conflict. We are also altering the names
of the data constructors Some
is equivlant to
Just
, and None
is equivalent to
Nothing
.
You may recognize the name Option
from other programming
languages, for instance, Rust uses the name Option
to refer
to this concept.
data Option a = Some a | None
total
readBool : String -> Option Bool
readBool "True" = Some True
readBool "False" = Some False
readBool _ = None
Let's go to the REPL and take a look at the types to get a feel for what we are working with here:
Tutorial.DataTypes.GenericDataTypes> :t Some
Tutorial.DataTypes.GenericDataTypes.Some : a -> Option a
Tutorial.DataTypes.GenericDataTypes> :t None
Tutorial.DataTypes.GenericDataTypes.None : Option a
Tutorial.DataTypes.GenericDataTypes> :t Option
Tutorial.DataTypes.GenericDataTypes.Option : Type -> Type
Option
is what we call a type constructor , it
is not yet a saturated type. It is, instead, a function from
Type
to Type
. Only when we provide it a
Type
argument does it actually become a type .
We say that Option
is a type constructor
parameterized over a parameter of type
Type
. While Option
may not be a type,
Option Bool
is a type, and so is
Option Weekday
. Even Option (Option Bool)
is a
valid type.
Some
and None
are Option
's
data constructors , they are functions used to create values of
type Option a
for a given type a
Let's see some other use cases for Option
. Below is a
safe division operation, returning None
on an attempt to
divide by zero, instead of throwing an exception:
total
safeDiv : Integer -> Integer -> Option Integer
safeDiv n 0 = None
safeDiv n k = Some (n `div` k)
It is important to understand the distinction between a function
returning, say, Maybe Integer
and returning
null
in a language like Java. In the former case, the
possibility of failure is visible in the types, and as a result the type
checker forces the programmer to deal with the possibility of the
function returning Nothing
you can't get an
Integer
out of a Maybe Integer
without pattern
matching on it and exposing the Nothing
, and Idris doesn't
let us forget to handle a case in a pattern match. This stands in stark
contrast with the Java-esq approach, where null
is
implicitly a valid value of every pointer-backed type, and
programmers may (and often will ) forget to handle the
null
case, leading to unexpected and often hard to debug
runtime exceptions.
While Maybe
is very useful for quickly encoding the
possibilty of failure, the one value it can provide for the failure
case, Nothing
, isn't very informative, it doesn't encode
any information about what went wrong. While there are many
cases where this is sensible, for instance, if you are attempting to
look up a value in a Map, a return value of Nothing
can
reasonably be assumed to mean the desired value wasn't present in the
Map, there are many more cases where multiple things could have gone
wrong and the user of or function would know to know which of the things
actually did go wrong.
As an example, in the case of our Weekday
parsing
function, it might be useful later on to know the value of the invalid
input string. Just like with Maybe
/Option
in
the previous section, this concept is general enough that we want to
consider other types of invalid data.
Let's build a data type to encode this:
data Validated e a = Invalid e | Valid a
Validated
is a type constructor parameterized over
two type parameters, e
and a
.
It's data constructors are Invalid
, which holds a value
describing some error condition, and Valid
, which describes
the result of a successful computation.
With our new Validated
type, we can augment our
readWeekday
function with some pretty useful error
reporting:
total
readWeekdayV : String -> Validated String Weekday
readWeekdayV "Monday" = Valid Monday
readWeekdayV "Tuesday" = Valid Tuesday
readWeekdayV "Wednesday" = Valid Wednesday
readWeekdayV "Thursday" = Valid Thursday
readWeekdayV "Friday" = Valid Friday
readWeekdayV "Saturday" = Valid Saturday
readWeekdayV "Sunday" = Valid Sunday
readWeekdayV s = Invalid ("Not a weekday: " ++ s)
Much like Maybe
, this is such a general and common
concept that the Prelude already contains a data type like
Validated
: Either
Either's Left
data constructor is equivalent to
Validated
's Invalid
, and Right
is
equivalent to Valid
.
It is very common for functions to encode the possibility of failure
by returning an Either err val
, where err
is
the error type and val
is the desired return type. This
provides a type safe (and total!) alternative to throwing a catchable
exception in an imperative language.
Use of Either
does not always imply that
Left
is an error and Right
is a success, a
function returning Either
just means that it can have two
different types of results, each of which are tagged with the
corresponding data constructor.
One of the most important data structures in pure functional
programming is the singly linked list, defined as follows (called
Seq
in order for it not to collide with List
,
which is of course already available from the Prelude):
data Seq a = Nil | (::) a (Seq a)
Seq
consists of two data constructors :
Nil
, representing the empty list(::)
, usually pronounced 'cons', which adds a new
element of type a
onto another list of values of type
a
.Notice how we are using operators as data constructors, this is sometimes quite useful, but it's usually more helpful to give your functions and data constructors clear names, and only use operators sparingly, when there is a compelling case that they would improve readability.
If we wanted to use the List
data constructors directly,
it would look something like this:
total
ints : List Int64
ints = 1 :: 2 :: -3 :: Nil
However, there is a more concise way of writing the above. Idris
accepts special syntax for constructing data types consisting exactly of
the two constructors Nil
and (::)
:
total
ints2 : List Int64
ints2 = [1, 2, -3]
total
ints3 : List Int64
ints3 = []
The two definitions ints
and ints2
are
treated identically by the compiler.
List syntax can also be used in pattern matching.
Seq
and List
both have another special
property, each of them is defined in terms of itself, due to the cons
operating taking a value and another
List
/Seq
as arguments. We call such data types
recursive data types, and as a result of their recursive
nature, decomposing or consuming them typically requires recursive
functions.
In a traditional imperative language, we might use a for loop, or
similar construct, to iterate over the values of a List
,
but traditional loops don't exist in a lanugage without in-place
mutation. Lets take a look at the recursive way of summing a list of
integers:
total
intSum : List Integer -> Integer
intSum Nil = 0
intSum (n :: ns) = n + intSum ns
Recursive functions can be hard to grasp at first, so we'll break
this down a bit. If we invoke intSum
with the empty list,
the first pattern matches and the function returns zero immediately. If,
however, we invoke intSum
with a non-empty list -
[7,5,9]
for instance - the following happens:
The second pattern matches and splits the list into two parts:
Its head (7
) is bound to variable n
and its
tail ([5,9]
) is bound to ns
:
7 + intSum [5,9]
In a second invocation, intSum
is called with a new
list: [5,9]
. The second pattern matches, n
is
bound to 5
, and ns
is bound to
[9]
:
7 + (5 + intSum [9])
In a third invocation intSum
is called with list
[9]
. The second pattern matches, n
is bound to
9
, and ns
is bound to []
:
7 + (5 + (9 + intSum [])
In a fourth invocation, intSum
is called with list
[]
and returns 0
immediately:
7 + (5 + (9 + 0)
In the third invocation, 9
and 0
are
added and 9
is returned:
7 + (5 + 9)
In the second invocation, 5
and 9
are
added and 14
is returned:
7 + 14
Finally, our initial invocation of intSum
adds
7
and 14
, returning 21
.
The recursive implementation of intSum
leads to a
sequence of nested calls to intSum
, terminating once the
argument is the empty list.
In order to fully appreciate the versatility that comes with generic data types, we also need to talk about generic functions. Like generic types, these are parameterized over one or more type parameters.
Consider for instance the case of breaking out of the
Option
data type. In case of a Some
, we'd like
to return the stored value, while for the None
case we
provide a default value. Here's how to do this, specialized to
Integer
s:
total
integerFromOption : Integer -> Option Integer -> Integer
integerFromOption _ (Some y) = y
integerFromOption x None = x
It's pretty obvious that this, again, is not general enough. Surely,
we'd also like to break out of Option Bool
or
Option String
in a similar fashion. That's exactly what the
generic function fromOption
does:
total
fromOption : a -> Option a -> a
fromOption _ (Some y) = y
fromOption x None = x
The lower-case a
is again a type parameter . You
can read the type signature as follows: "For any type a
,
given a value of type a
, and an
Option a
, we can return a value of type a
."
Note, that fromOption
knows nothing else about
a
, other than it being a type. It is therefore not
possible, to conjure a value of type a
out of thin air. We
must have a value available to deal with the None
case.
The pendant to fromOption
for Maybe
is
called fromMaybe
and is available from module
Data.Maybe
from the base library.
Sometimes, fromOption
is not general enough. Assume we'd
like to print the value of a freshly parsed Bool
, giving
some generic error message in case of a None
. We can't use
fromOption
for this, as we have an Option Bool
and we'd like to return a String
. Here's how to do
this:
total
option : b -> (a -> b) -> Option a -> b
option _ f (Some y) = f y
option x _ None = x
total
handleBool : Option Bool -> String
handleBool = option "Not a boolean value." show
Function option
is parameterized over two type
parameters: a
represents the type of values stored in the
Option
, while b
is the return type. In case of
a Just
, we need a way to convert the stored a
to a b
, an that's done using the function argument of type
a -> b
.
In Idris, lower-case identifiers in function types are treated as type parameters , while upper-case identifiers are treated as types or type constructors that must be in scope.