Year 2015 Day 8

This day provides a more in depth introduction to writing effectful parsers, making use of state and non-determinism in our parsers.


import Data.String
import Data.Vect
import Data.List.Lazy
import Data.Either

Parsing

A "Parser" is an effectful computation that has access to a list of chars as state, can throw exceptions of type String, and has non-determinism through the Choose effect, which consumes some of the state to potentially return a value.

Basic operations

Get the next character out of the parser state, updating the state to consume that character.


nextChar : Has (State (List Char)) fs => Has (Except String) fs => Eff fs Char
nextChar = do
  c :: rest <- get
    | [] => throw "End of input"
  put rest
  pure c

Attempt to parse a single character based on the supplied predicate, consuming the character from the state and throwing the provided error if the predicate does not hold over the consumed character.


char : Has (State (List Char)) fs => Has (Except String) fs => 
  (pred : Char -> Bool) -> (err : Char -> String) -> Eff fs Char
char pred err = do
  c <- nextChar
  unless (pred c) (throw $ err c)
  pure c

Type alias for a parser


Parser : (res : Type) -> Type
Parser res = Eff [State (List Char), Except String, Choose, Logger] res

Parse 0+ of a thing, speculatively attempting to apply the given parser. In the event the supplied parser fails, catch the error, unwind changes to the state, and return an empty list, otherwise append the parsed element to the head of the returned list and recurse.

The rewinding of the state on failure could be handled implicitly by the effect stack if Except was structured a bit differently, but that's a topic for another day.


many : (f : Parser t) -> Parser (List t)
many f = do
  state <- get
  Just x <- lift $ catch (\ _ => pure Nothing) (map Just f)
    | Nothing => do
      put state
      pure []
  map (x ::) $ many f

"Parse" the end of the input, returning a unit if we are at the end of input, throwing an error otherwise.


endOfInput : Parser ()
endOfInput = do
  [] <- get
    | xs => throw "Expected end of input, state non empty: `\{pack xs}`"
  pure ()

Character Classes

Parse a single ", throwing an error if the current character is anything else. Returns a unit since there is only one possible character this parses, and this avoids us having to discard the character later in our string parser.


quote : Parser ()
quote = do
  _ <- char (== '"') (\x => "Expected `\"`, got `\{String.singleton x}`")
  pure ()

Parse any character except \ or "


normal : Parser Char
normal = 
  char 
    (\x => not $ any (== x) (the (List _) ['\\', '"'])) 
    (\x => "Expected normal, got `\{String.singleton x}`")

Escaped Characters

Parse the character sequence \", returning just the ". Despite the fact that can only return one possible character, like quote above, we return the parsed character, as we intend to provide all the escaped character parsers to the oneOfM combinator later.


eQuote : Parser Char
eQuote = do
  _ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
  char (== '"') (\x => "Expected `\"`, got `\{String.singleton x}`")

Parse the character sequence \, returning just the second \.


eSlash : Parser Char
eSlash = do
  _ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
  char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")

Convert a lowercase hexadecimal digit to its numerical value.


fromHex : Char -> Int
fromHex c = 
  if ord c >= 97
    then ord c - 87
    else ord c - 48

Parse a character sequence \xAB, where AB are hexadecimal digits, and decode the numerical value of AB, as a hexadecimal number, into its corresponding character.


eHex : Parser Char
eHex = do
  _ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
  _ <- char (== 'x') (\x => "Expected `x`, got `\{String.singleton x}`")
  [x, y] <- map (map $ fromHex . toLower) . 
            sequence . 
            Vect.replicate 2 $
            (char 
              isHexDigit 
              (\x => "Expected hex digit, got `\{String.singleton x}`"))
  pure . chr $ x * 0x10 + y

Use the oneOfM combinator to combine our escaped character parsers into a single character class. oneOfM uses the Choice effect to try all of the provided parsers, conceptually in parallel, returning all of the results.


escaped : Parser Char
escaped = oneOfM $ the (List _) [eQuote, eSlash, eHex]

Top Level Class

Combine our normal and escaped parsers into a single parser for non-quote characters.


character : Parser Char
character = oneOfM $ the (List _) [normal, escaped]

Parse a string literal by describing its layout as a quote ("), followed by any number of characters, then another quote, followed by the end of input. Return the characters between the outer quotes.


string : Parser (List Char)
string = do
  quote
  xs <- many character
  quote
  endOfInput
  pure xs

Running a parser

Run a parser, with some debug logging, by peeling the parsing effects of of the type. The order is important here, remember that function composition "runs" right to left.

We peel the state off the type first, so that we can get implicit "rewinding" of the state inside of our combinators, like oneOfM.

For a full understanding of what's going on here, we need to see how the type signature changes as we peel effects off the type, you can uncomment the commented lines, and comment out the rest of the function, modifying the let output = line to follow along yourself, though the types you get may look a little different due to idris evaluating type alaises like Eff for you, I am keeping them in aliased forms, and excluding Logger, which doesn't impact the semantic of parsing, to keep the examples concise:

  • runstate (unpack str) $ x

    This produces a value with type Eff [Except String, Choose] (List Char, List Char). In this tuple of values, the first element is the actual output of the parser, and the second element is the state after the parser has run.

  • runExcept . runState (unpack str) $ x

    This produces a value with type Eff [Choose] (Either String (List Char, List Char)). This corresponds to a computation that either returns our tuple of output and state from before, or an error. Important to note here is that, in the error case, we only return a String (our error type), our state is discarded.

  • runChoose {f = LazyList} . runExcept . runState (unpack str) $ x

    The Choose effect, works with any type that implements the Alternative interface, and the choice of type impacts the semantics. A full discussion of this is beyond the scope for today, but we chose to "run" Choose with LazyList, which effectively gives us an iterator over all the possible parsings of our input text.

    This produces a value with type Eff [] (LazyList (Either String (List Char, List Char))). When we hit an application of Choose, like oneOfM, all possibilities will be tried and each one will be added to our output LazyList. Because this is a LazyList, and not a List, only values we actually inspect are generated, allowing parsing to terminate after the first successful parse without having to generate the rest of the list.

    Note that we have an independent possible state value for each slot in the list, this speaks to this effect stack, when run in this order, providing a sort of branching behavior for states, allowing different branches in the Choose effect to modify their state without impacting the state of other branches.

From there, we run our lazyRights helper function over the outputs LazyList to discard parsing paths that result in an error, extract the first element of each tuple, get the head of the list, if one still exists, and use pack to convert the contents of the resulting Maybe (List Char) to a string. Then a little bit of debug logging, and return the output.


runParser : Has Logger fs => Parser (List Char) -> String 
  -> Eff fs $ Maybe String
runParser x str = do
  info "Parsing: \{str}"
  -- let outputs = 
  --   runChoose {f = LazyList} . runExcept . runState (unpack str) $ x
  -- ?parser_types
  outputs <- 
    lift . runChoose {f = LazyList} . runExcept . runState (unpack str) $ x
  let output = map pack . head' . map fst . lazyRights $ outputs
  case output of
    Nothing =>
      debug "Failed: \{show outputs}"
    Just y => do
      debug "Got: \{y}"
      trace "\{show outputs}"
  pure output
  where
    lazyRights : LazyList (Either a b) -> LazyList b
    lazyRights [] = []
    lazyRights (Left _ :: xs) = lazyRights xs
    lazyRights (Right x :: xs) = x :: lazyRights xs

Escaping characters

This is much more boring the the parsing, we just do simple recursive pattern matching on the characters of the provided string, escaping " and \, and surround the resulting string with quotes.


escape : String -> String
escape str = "\"\{pack . escape' . unpack $ str}\""
  where
  escape' : List Char -> List Char
  escape' [] = []
  escape' ('"' :: xs) = '\\' :: '"' :: escape' xs
  escape' ('\\' :: xs) = '\\' :: '\\' :: escape' xs
  escape' (x :: xs) = x :: escape' xs

Part Functions

Part 1

Convert the inputs into a list of lines, traverse our parser over it, deal with possible failures by sequencing the List (Maybe String) into a Maybe (List String), and the compute the difference in character counts.


part1 : Eff (PartEff String) (Int, List String)
part1 = do
  inputs <- map lines $ askAt "input"
  outputs <- traverse (runParser string) inputs
  Just outputs <- pure $ sequence outputs
    | _ => throw "Some strings failed to parse"
  let difference = 
    sum $ zipWith (\x, y => strLength x - strLength y) inputs outputs
  pure (difference, inputs)

Part 2

Map our character escaping function over our input string and compute the difference in character counts.

Make a little stop along the way to ensure that escape -> parse round trips without changing the content of the string.


part2 : List String -> Eff (PartEff String) Int
part2 inputs = do
  let outputs = map escape inputs
  Just reversed_outputs <- map sequence . traverse (runParser string) $ outputs
    | _ => throw "Reversing escaping of the inputs failed"
  unless (all id $ zipWith (==) inputs reversed_outputs) $ do
    debug . delay . joinBy "\n" . map show . filter (\(x, y, z) => x /= z) $ 
      zip3 inputs outputs reversed_outputs
    throw "Parsed outputs were not identical to inputs"
  let difference = 
    sum $ zipWith (\x, y => strLength y - strLength x) inputs outputs
  pure difference