Generic Data Types

module  Tutorial.DataTypes.GenericDataTypes

import  Tutorial.DataTypes.Enumerations

Sometimes, a concept is general enough that we'd like to apply it not just to one type, but to all kinds of types. For instance, it would be highly inconvenient if we had to define a notion of a list over and over again for every type we might want to put in a list, a list of integers and a list of strings may have differently typed contents, but they share the same structure, and repeating that structure would lead to lots of code duplication. Instead we'd like to have a single generic list type parameterized by the type of value it stores.

In this section, we will explore the definition and use of generic types.

Maybe

Suppose we want to parse a value of the Weekday type, described in the section on enumerations, from user input. Such a parsing function should return Saturday if the user provided string was "Saturday" , but what should it return if the input were, say, "sdfkl332" ? We have another of options here, for instance, we could return another "default" value, such as Sunday , but we must consider what behavior programmers using our library might expect. However, silently continuing with such a default value in the face of invalid user input is very rarely the best option, and can lead to a lot of confusion.

If we were working in a traditional imperative language, our parsing function would probably throw an exception upon encountering such invalid input. Idris has the option to throw an exception too, through the idris_crash function in the Prelude , but we need to abandon totality to do this, our function would no longer return a value for every possible input. This is a high price to pay for something as commonplace as a parsing error.

In a language like Java, C#, or C++, our function might also return some sort of null value (leading to the dreaded NullPointerException if not handled properly in the consuming code). Our solution will be conceptually similar, but instead of silently returning a null , we will make the possibility of failure visible in the type. We'll define a custom data type, which encapsulates the possibility of failure.

Defining new data types in Idris is very cheap, in terms of the amount of code needed, so this pattern is very often encountered as a way of increasing type safety. Here's what such a bespoke type might look like for our use-case, along with the associated parsing function:

data  MaybeWeekday  =  WD  Weekday  |  NoWeekday

total
readWeekday  :  String  ->  MaybeWeekday
readWeekday  "Monday"     =  WD  Monday
readWeekday  "Tuesday"    =  WD  Tuesday
readWeekday  "Wednesday"  =  WD  Wednesday
readWeekday  "Thursday"   =  WD  Thursday
readWeekday  "Friday"     =  WD  Friday
readWeekday  "Saturday"   =  WD  Saturday
readWeekday  "Sunday"     =  WD  Sunday
readWeekday  _            =  NoWeekday

But what if we'd like also like to read a Bool from user input? We'd have to write another custom data type, MaybeBool , and so on for all the types we'd like to be able to parse from a String .

Idris, like many other programming languages, allows us to generalize this behavior by using generic data types . Here's what the generic version of our MaybeWeekday might look like:

Note

While the prelude calls this type Maybe , we are calling it Option to avoid conflict. We are also altering the names of the data constructors Some is equivlant to Just , and None is equivalent to Nothing .

You may recognize the name Option from other programming languages, for instance, Rust uses the name Option to refer to this concept.

data  Option  a  =  Some  a  |  None

total
readBool  :  String  ->  Option  Bool
readBool  "True"     =  Some  True
readBool  "False"    =  Some  False
readBool  _          =  None

Let's go to the REPL and take a look at the types to get a feel for what we are working with here:

Tutorial.DataTypes.GenericDataTypes> :t Some
Tutorial.DataTypes.GenericDataTypes.Some : a -> Option a
Tutorial.DataTypes.GenericDataTypes> :t None
Tutorial.DataTypes.GenericDataTypes.None : Option a
Tutorial.DataTypes.GenericDataTypes> :t Option
Tutorial.DataTypes.GenericDataTypes.Option : Type -> Type
Note

Option is what we call a type constructor , it is not yet a saturated type. It is, instead, a function from Type to Type . Only when we provide it a Type argument does it actually become a type .

We say that Option is a type constructor parameterized over a parameter of type Type . While Option may not be a type, Option Bool is a type, and so is Option Weekday . Even Option (Option Bool) is a valid type.

Some and None are Option 's data constructors , they are functions used to create values of type Option a for a given type a

Let's see some other use cases for Option . Below is a safe division operation, returning None on an attempt to divide by zero, instead of throwing an exception:

total
safeDiv  :  Integer  ->  Integer  ->  Option  Integer
safeDiv  n  0  =  None
safeDiv  n  k  =  Some  (n  `div`  k)

It is important to understand the distinction between a function returning, say, Maybe Integer and returning null in a language like Java. In the former case, the possibility of failure is visible in the types, and as a result the type checker forces the programmer to deal with the possibility of the function returning Nothing you can't get an Integer out of a Maybe Integer without pattern matching on it and exposing the Nothing , and Idris doesn't let us forget to handle a case in a pattern match. This stands in stark contrast with the Java-esq approach, where null is implicitly a valid value of every pointer-backed type, and programmers may (and often will ) forget to handle the null case, leading to unexpected and often hard to debug runtime exceptions.

Either

While Maybe is very useful for quickly encoding the possibilty of failure, the one value it can provide for the failure case, Nothing , isn't very informative, it doesn't encode any information about what went wrong. While there are many cases where this is sensible, for instance, if you are attempting to look up a value in a Map, a return value of Nothing can reasonably be assumed to mean the desired value wasn't present in the Map, there are many more cases where multiple things could have gone wrong and the user of or function would know to know which of the things actually did go wrong.

As an example, in the case of our Weekday parsing function, it might be useful later on to know the value of the invalid input string. Just like with Maybe /Option in the previous section, this concept is general enough that we want to consider other types of invalid data.

Let's build a data type to encode this:

data  Validated  e  a  =  Invalid  e  |  Valid  a
Note

Validated is a type constructor parameterized over two type parameters, e and a .

It's data constructors are Invalid , which holds a value describing some error condition, and Valid , which describes the result of a successful computation.

With our new Validated type, we can augment our readWeekday function with some pretty useful error reporting:

total
readWeekdayV  :  String  ->  Validated  String  Weekday
readWeekdayV  "Monday"     =  Valid  Monday
readWeekdayV  "Tuesday"    =  Valid  Tuesday
readWeekdayV  "Wednesday"  =  Valid  Wednesday
readWeekdayV  "Thursday"   =  Valid  Thursday
readWeekdayV  "Friday"     =  Valid  Friday
readWeekdayV  "Saturday"   =  Valid  Saturday
readWeekdayV  "Sunday"     =  Valid  Sunday
readWeekdayV  s            =  Invalid  ("Not a weekday: "  ++  s)
Note

Much like Maybe , this is such a general and common concept that the Prelude already contains a data type like Validated : Either

Either's Left data constructor is equivalent to Validated 's Invalid , and Right is equivalent to Valid .

It is very common for functions to encode the possibility of failure by returning an Either err val , where err is the error type and val is the desired return type. This provides a type safe (and total!) alternative to throwing a catchable exception in an imperative language.

Note

Use of Either does not always imply that Left is an error and Right is a success, a function returning Either just means that it can have two different types of results, each of which are tagged with the corresponding data constructor.

List

One of the most important data structures in pure functional programming is the singly linked list, defined as follows (called Seq in order for it not to collide with List , which is of course already available from the Prelude):

data  Seq  a  =  Nil  |  (::)  a  (Seq  a)
Note

Seq consists of two data constructors :

Notice how we are using operators as data constructors, this is sometimes quite useful, but it's usually more helpful to give your functions and data constructors clear names, and only use operators sparingly, when there is a compelling case that they would improve readability.

If we wanted to use the List data constructors directly, it would look something like this:

total
ints  :  List  Int64
ints  =  1  ::  2  ::  -3  ::  Nil

However, there is a more concise way of writing the above. Idris accepts special syntax for constructing data types consisting exactly of the two constructors Nil and (::) :

total
ints2  :  List  Int64
ints2  =  [1,  2,  -3]

total
ints3  :  List  Int64
ints3  =  []

The two definitions ints and ints2 are treated identically by the compiler.

Note

List syntax can also be used in pattern matching.

Seq and List both have another special property, each of them is defined in terms of itself, due to the cons operating taking a value and another List /Seq as arguments. We call such data types recursive data types, and as a result of their recursive nature, decomposing or consuming them typically requires recursive functions.

In a traditional imperative language, we might use a for loop, or similar construct, to iterate over the values of a List , but traditional loops don't exist in a lanugage without in-place mutation. Lets take a look at the recursive way of summing a list of integers:

total
intSum  :  List  Integer  ->  Integer
intSum  Nil        =  0
intSum  (n  ::  ns)  =  n  +  intSum  ns

Recursive functions can be hard to grasp at first, so we'll break this down a bit. If we invoke intSum with the empty list, the first pattern matches and the function returns zero immediately. If, however, we invoke intSum with a non-empty list - [7,5,9] for instance - the following happens:

  1. The second pattern matches and splits the list into two parts: Its head (7 ) is bound to variable n and its tail ([5,9] ) is bound to ns :

    7 + intSum [5,9]
  2. In a second invocation, intSum is called with a new list: [5,9] . The second pattern matches, n is bound to 5 , and ns is bound to [9] :

    7 + (5 + intSum [9])
  3. In a third invocation intSum is called with list [9] . The second pattern matches, n is bound to 9 , and ns is bound to [] :

    7 + (5 + (9 + intSum [])
  4. In a fourth invocation, intSum is called with list [] and returns 0 immediately:

    7 + (5 + (9 + 0)
  5. In the third invocation, 9 and 0 are added and 9 is returned:

    7 + (5 + 9)
  6. In the second invocation, 5 and 9 are added and 14 is returned:

    7 + 14
  7. Finally, our initial invocation of intSum adds 7 and 14 , returning 21 .

The recursive implementation of intSum leads to a sequence of nested calls to intSum , terminating once the argument is the empty list.

Generic Functions

In order to fully appreciate the versatility that comes with generic data types, we also need to talk about generic functions. Like generic types, these are parameterized over one or more type parameters.

Consider for instance the case of breaking out of the Option data type. In case of a Some , we'd like to return the stored value, while for the None case we provide a default value. Here's how to do this, specialized to Integer s:

total
integerFromOption  :  Integer  ->  Option  Integer  ->  Integer
integerFromOption  _  (Some  y)  =  y
integerFromOption  x  None      =  x

It's pretty obvious that this, again, is not general enough. Surely, we'd also like to break out of Option Bool or Option String in a similar fashion. That's exactly what the generic function fromOption does:

total
fromOption  :  a  ->  Option  a  ->  a
fromOption  _  (Some  y)  =  y
fromOption  x  None      =  x

The lower-case a is again a type parameter . You can read the type signature as follows: "For any type a , given a value of type a , and an Option a , we can return a value of type a ." Note, that fromOption knows nothing else about a , other than it being a type. It is therefore not possible, to conjure a value of type a out of thin air. We must have a value available to deal with the None case.

The pendant to fromOption for Maybe is called fromMaybe and is available from module Data.Maybe from the base library.

Sometimes, fromOption is not general enough. Assume we'd like to print the value of a freshly parsed Bool , giving some generic error message in case of a None . We can't use fromOption for this, as we have an Option Bool and we'd like to return a String . Here's how to do this:

total
option  :  b  ->  (a  ->  b)  ->  Option  a  ->  b
option  _  f  (Some  y)  =  f  y
option  x  _  None      =  x

total
handleBool  :  Option  Bool  ->  String
handleBool  =  option  "Not a boolean value."  show

Function option is parameterized over two type parameters: a represents the type of values stored in the Option , while b is the return type. In case of a Just , we need a way to convert the stored a to a b , an that's done using the function argument of type a -> b .

In Idris, lower-case identifiers in function types are treated as type parameters , while upper-case identifiers are treated as types or type constructors that must be in scope.