University of Zagreb Faculty of Electrical Engineering and Computing PROGRAMMING IN HASKELL Academic Year 2014/2015 LECTURE 9: Custom data types 2 v1.1 (c) 2014 Jan Šnajder ============================================================================== > import Data.List > import Control.Monad === INTRO ==================================================================== Last week we introduced the 'data' keyword for defining custom data types: algebraic data types, records, and polymorphic data types. Today we extend on this and look into recursive data types. In particular, we look into polymorphic recursive types, such as lists and trees, which are important because they serve as data containers. === RECURSIVE DATA STRUCTURES ================================================ Data structures can be recursive. In fact, the most useful data structures are recursive. > data Sex = Male | Female deriving (Show,Read,Ord,Eq) > data Person = Person { > idNumber :: String, > forename :: String, > surname :: String, > sex :: Sex, > age :: Int, > partner :: Maybe Person, > children :: [Person] } deriving (Show,Read,Ord) Let's look at one family situation: Pero and Ana are Marko's parents, Marko and Maja are dating... > pero = Person "2323" "Pero" "Perić" Male 45 (Just ana) [marko] > ana = Person "3244" "Ana" "Anić" Female 43 (Just pero) [marko,iva] > marko = Person "4341" "Marko" "Perić" Male 22 (Just maja) [] > maja = Person "7420" "Maja" "Majić" Female 20 (Just marko) [] > iva = Person "4642" "Iva" "Ivić" Female 16 Nothing [] Now try to print out the value of 'pero'. What's happening? 'pero' (and other values we've defined) are infinite recursive structure. This is because Pero is Ana's partner, while Ana's partner is Pero, who's partner is Ana again, etc. If we were to represent these relationships with a graph, it would have cycles. If a graph has cycles, the corresponding data structure is infinite! What's going on here? pero == ana pero == pero The first comparison works just fine, while the second one never terminates because 'pero' is an infinite structure. (Why does the first computation work then?) Let's write a function to return a name of one's partner, if such exists: > partnersForename :: Person -> Maybe String > partnersForename p = case partner p of > Just p -> Just $ forename p > Nothing -> Nothing or shorter: > partnersForename2 :: Person -> Maybe String > partnersForename2 = fmap forename . partner Children of both partners: > pairsChildren :: Person -> [Person] > pairsChildren p = nub $ children p ++ case partner p of > Nothing -> [] > Just r -> children r or shorter: > pairsChildren2 :: Person -> [Person] > pairsChildren2 p = nub $ children p ++ maybe [] children (partner p) Will this work? Nope. The problem is that 'nub' will compare the elements of the list. If there are two equal elements, this computation will never terminate. And now for something completely different. How would you go about defining the following function: partnersMother :: Person -> Maybe Person Unfortunately, there's no way to do this because we have no link back to one person's parents. We need to either add this backlink, or put all persons in a list and then search the list for parents of a given person. Here's the first approach: > data Person2 = Person2 { > personId2 :: String, > forename2 :: String, > surname2 :: String, > sex2 :: Sex, --- data Sex = Male | Female deriving (Show,Read,Eq,Ord) > mother2 :: Maybe Person2, > father2 :: Maybe Person2, > partner2 :: Maybe Person2, > children2 :: [Person2] } deriving (Show,Read,Eq,Ord) > john = Person2 "123" "John" "Doe" Male Nothing Nothing (Just jane) [] > jane = Person2 "623" "Jane" "Fox" Female (Just ann) Nothing (Just john) [] > ann = Person2 "343" "Ann" "Doe" Female Nothing Nothing Nothing [jane] === EXERCISE 1 =============================================================== 1.1. - Define a function 1.2. - Define a function parentCheck :: Person2 -> Bool that checks whether the given person is one of the children of its parents. 1.3. - Define a function sister :: Person2 -> Maybe Person2 that returns the sister of a person, if such exists. 1.4. - Define a function that returns all descendants of a person. descendant :: Person2 -> [Person2] ============================================================================== We already know that a list is also a recursive structure. Moreover, it is a parametrized recursive structure. Let's define our own list data type: > data MyList a = Empty | Cons a (MyList a) deriving (Show,Read,Ord,Eq) Now we can define some lists: > l1 = 1 `Cons` Empty > l2 = 1 `Cons` (2 `Cons` (3 `Cons` Empty)) To improve readability, we can define our own infix operator: > infixr 5 -+- > (-+-) = Cons Operator priority (set to 5 in the above example) range from 0 (lowest priority) to 9 (highest priority). E.g., ($) has a priority 0, while (.) has priority 9. You can find out more here: http://www.haskell.org/onlinereport/decls.html#fixity We can now write: > l3 = 1 -+- 2 -+- 3 -+- Empty === EXERCISE 2 =============================================================== 2.1. - Define listHead :: MyList a -> Maybe a 2.2. - Define a function that works like 'map' but works on a 'MyList' type: listMap :: (a -> b) -> MyList a -> MyList b ============================================================================== A prototypical example of a recursive data structure is a tree. Here's a binary tree that stores the values in its inner nodes: > data Tree a = Null | Node a (Tree a) (Tree a) deriving (Show,Eq) E.g., a binary tree of integers: > intTree :: Tree Int > intTree = Node 1 (Node 2 Null Null) (Node 3 Null Null) A function that sums the elements in a binary tree of integers: > sumTree :: Tree Int -> Int > sumTree Null = 0 > sumTree (Node x left right) = x + sumTree left + sumTree right A function that tests whether an element is contained in a tree: > treeElem :: Eq a => a -> Tree a -> Bool > treeElem _ Null = False > treeElem x (Node y left right) > | x == y = True > | otherwise = treeElem x left || treeElem x right === EXERCISE 3 =============================================================== 3.1. - Define a function treeMax :: Ord a => Tree a -> a that finds the maximum element in a tree. Return an error if the tree is empty. 3.2. - Define a function treeToList :: Ord a => Tree a -> [a] that will collect in a list all elements from inner nodes of a tree by doing an in-order (left-root-right) traversal. 3.3. - Define a function to prune the tree at a given level (root has level 0). levelCut :: Int -> Tree a -> Tree a ============================================================================== A sorted tree (binary search tree): for each node containing value 'x', the left subtree contains values that are less than 'x', while the right subtree contains values that are greater than 'x'. Insertion into a binary search tree: > treeInsert :: Ord a => a -> Tree a -> Tree a > treeInsert x Null = Node x Null Null > treeInsert x t@(Node y l r) > | x < y = Node y (treeInsert x l) r > | x > y = Node y l (treeInsert x r) > | otherwise = t === EXERCISE 4 =============================================================== 4.1. - Define a function that converts a list into a sorted tree. listToTree :: Ord a => [a] -> Tree a 4.2. - Using 'listToTree' and 'treeToList' defined previously, define these two functions, define: sortAndNub :: Ord a => [a] -> [a] === DERIVING TYPE CLASS INSTANCES ============================================ We've already seen that Haskell can automatically derive instances for main type classes: Eq, Ord, Show, Read, ... Recall an earlier example: data Person = Person { idNumber :: String, forename :: String, surname :: String, sex :: Sex, age :: Int, partner :: Maybe Person, children :: [Person] } deriving (Show,Read,Eq,Ord) We can do: > t1 = marko == ana > t2 = ana > marko > t3 = compare ana marko > ps = sort [marko,ana,pero] If a type is an instance of the 'Read' type class, we can read in its values from a string: > s = read "Male" :: Sex > p3 = read $ "Person {idNumber=\"111\",forename=\"Ivo\",surname=\"Ivic\"," ++ > "sex=Male,age=11,partner=Nothing,children=[]}" :: Person The built-in 'read' assumes that the string conforms to the Haskell syntax. Similarly, build-in 'show' outputs the strings in Haskell syntax. A user can of course redefine 'read' and 'show', but doing so is recommended only is a few cases (as we'll see later). Note that 'read' and 'show' are very useful for data serialization. The 'Enum' type class allows for enumerating: > data Weekday = > Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday > deriving (Show,Enum) > yesterday :: Weekday -> Weekday > yesterday = pred > dayAfterYesterday :: Weekday -> Weekday > dayAfterYesterday = succ . pred > workDays = [Monday .. Friday] === DEFINING TYPE CLASS INSTANCES ============================================ First and foremost, let's remind ourselves: type classes have nothing to do with classes in OOP. Now let's look at an example. We'd like to define a different kind of equality test for our data type. E.g., we'd like to consider two persons to be identical if they have the same national identification number. In this case we would not need to check the other fields (which is good because the structure is infinite). Similarly, we could define an ordering based on this number. Let's have a look at how 'Eq' type class is defined: class Eq a where (==) :: a -> a -> Bool (/=) :: a -> a -> Bool x == y = not (x /= y) x /= y = not (x == y) There's the name of the type class, a type variable, and a list of functions that must be defined for this type, in this case functions (==) and (/=). The definitions themselves are not given here, only the type signatures. We then can (but most not) have default definitions of functions, like it is done in this case. Having default definitions means that, when defining and instance, one will not have to define all functions but can rely on default definitions. For example, in this case it suffices to define (==), because there's a default definition for (/=) that uses (==). So, for each type we only have to define those functions that have no default definitions. This is called a MINIMAL COMPLETE DEFINITION. In case of 'Eq', the minimal complete definition is either the (==) function or the (/=) function. Let's look now at how we can define that a type is an instance of the 'Eq' type class: instance Eq Weekday where Monday == Monday = True Tuesday == Tuesday = True Wednesday == Wednesday = True Thursday == Thursday = True Friday == Friday = True Saturday == Saturday = True Sunday == Sunday = True _ == _ = False Of course, this can become tedious, so that's why we can automatically derive an instance. We can now define our own 'Eq' instance for the 'Person' type: > instance Eq Person where > p1 == p2 = idNumber p1 == idNumber p2 REMARK: We must now remove 'deriving Eq' from the definition of the 'Person' type. Now 'pero == pero' will work (check it out!). Let's also define an instance 'Ord' type class. The minimal complete definition is (<=). So it suffices to define: instance Ord Person where p1 <= p2 = idNumber p1 <= idNumber p2 REMARK: We can use the ":info" command in ghci to see a definition of a type class and all its instances. === EXERCISE 5 =============================================================== 5.1. - Define an 'Eq' instance for the 'Weekday' type that works like (==), except that two Fridays are never identical. 5.2. - Define 'Person' as an instance of 'Show' type class so that instead of the values of partners and children only the respective person names are shown, which will enable the print out of an infinite structure of this type. ============================================================================== What if we want to define an instance of a parametrized type? That's not a problem. We simply have to provide a type variable together with the type constructor. E.g.: instance Eq (Maybe a) Just x == Just y = x == y Nothing == Nothing = True _ == _ = False A DIGRESSION: Why can't we simply write 'Maybe'? Because 'Eq' type class expects a type, whereas 'Maybe' is not a type but a type constructor. In other words, a type constructor 'Maybe' will produce a type only when given a type as input. E.g., 'Maybe Int' is a type. Similarly, 'Either' expects two types, before it produces a type. So these type constructors themselves are of different "types". We're talking about "types of types" here, which we call KINDS in Haskell. 'Int', 'Maybe', and 'Either' are type constructors of different kind. You can find out their kind using the ":kind" command in ghci: Int :: * Maybe :: * -> * Either :: * -> * -> * Kind '*' is just an ordinary type. Kind '* -> *' is a unary type constructor, that takes in an ordinary type and returns an ordinary type. == EXERCISE 6 ================================================================ 6.1. - Define an instance of 'Eq' for 'MyList a' so that two lists are considered equal if they have the same first element. 6.2. - Define an instance of 'Eq' for 'Tree a' so that two trees are considered equal if they store the same values, regardless of the position of these values in the trees, and regardless of duplicates. === NEXT ===================================================================== In the next lecture we'll look into custom types classes as well as standard data types, such as sets, maps, trees, and graphs.