Year 2015 Day 8
This day provides a more in depth introduction to writing effectful parsers, making use of state and non-determinism in our parsers.
import Data.String
import Data.Vect
import Data.List.Lazy
import Data.Either
Parsing
A "Parser" is an effectful computation that has access to a list of chars as
state, can throw exceptions of type String
, and has non-determinism through
the Choose
effect, which consumes some of the state to potentially return a
value.
Basic operations
Get the next character out of the parser state, updating the state to consume that character.
nextChar : Has (State (List Char)) fs => Has (Except String) fs => Eff fs Char
nextChar = do
c :: rest <- get
| [] => throw "End of input"
put rest
pure c
Attempt to parse a single character based on the supplied predicate, consuming the character from the state and throwing the provided error if the predicate does not hold over the consumed character.
char : Has (State (List Char)) fs => Has (Except String) fs =>
(pred : Char -> Bool) -> (err : Char -> String) -> Eff fs Char
char pred err = do
c <- nextChar
unless (pred c) (throw $ err c)
pure c
Type alias for a parser
Parser : (res : Type) -> Type
Parser res = Eff [State (List Char), Except String, Choose, Logger] res
Parse 0+ of a thing, speculatively attempting to apply the given parser. In the event the supplied parser fails, catch the error, unwind changes to the state, and return an empty list, otherwise append the parsed element to the head of the returned list and recurse.
The rewinding of the state on failure could be handled implicitly by the
effect stack if Except
was structured a bit differently, but that's a topic
for another day.
many : (f : Parser t) -> Parser (List t)
many f = do
state <- get
Just x <- lift $ catch (\ _ => pure Nothing) (map Just f)
| Nothing => do
put state
pure []
map (x ::) $ many f
"Parse" the end of the input, returning a unit if we are at the end of input, throwing an error otherwise.
endOfInput : Parser ()
endOfInput = do
[] <- get
| xs => throw "Expected end of input, state non empty: `\{pack xs}`"
pure ()
Character Classes
Parse a single "
, throwing an error if the current character is anything else.
Returns a unit since there is only one possible character this parses, and this
avoids us having to discard the character later in our string
parser.
quote : Parser ()
quote = do
_ <- char (== '"') (\x => "Expected `\"`, got `\{String.singleton x}`")
pure ()
Parse any character except \
or "
normal : Parser Char
normal =
char
(\x => not $ any (== x) (the (List _) ['\\', '"']))
(\x => "Expected normal, got `\{String.singleton x}`")
Escaped Characters
Parse the character sequence \"
, returning just the "
. Despite the fact that
can only return one possible character, like quote
above, we return the parsed
character, as we intend to provide all the escaped character parsers to the
oneOfM
combinator later.
eQuote : Parser Char
eQuote = do
_ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
char (== '"') (\x => "Expected `\"`, got `\{String.singleton x}`")
Parse the character sequence \
, returning just the second \
.
eSlash : Parser Char
eSlash = do
_ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
Convert a lowercase hexadecimal digit to its numerical value.
fromHex : Char -> Int
fromHex c =
if ord c >= 97
then ord c - 87
else ord c - 48
Parse a character sequence \xAB
, where AB
are hexadecimal digits, and decode
the numerical value of AB
, as a hexadecimal number, into its corresponding
character.
eHex : Parser Char
eHex = do
_ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
_ <- char (== 'x') (\x => "Expected `x`, got `\{String.singleton x}`")
[x, y] <- map (map $ fromHex . toLower) .
sequence .
Vect.replicate 2 $
(char
isHexDigit
(\x => "Expected hex digit, got `\{String.singleton x}`"))
pure . chr $ x * 0x10 + y
Use the oneOfM
combinator to combine our escaped character parsers into a
single character class. oneOfM
uses the Choice
effect to try all of the
provided parsers, conceptually in parallel, returning all of the results.
escaped : Parser Char
escaped = oneOfM $ the (List _) [eQuote, eSlash, eHex]
Top Level Class
Combine our normal
and escaped
parsers into a single parser for non-quote
characters.
character : Parser Char
character = oneOfM $ the (List _) [normal, escaped]
Parse a string literal by describing its layout as a quote ("
), followed by
any number of characters, then another quote, followed by the end of input.
Return the characters between the outer quotes.
string : Parser (List Char)
string = do
quote
xs <- many character
quote
endOfInput
pure xs
Running a parser
Run a parser, with some debug logging, by peeling the parsing effects of of the type. The order is important here, remember that function composition "runs" right to left.
We peel the state off the type first, so that we can get implicit "rewinding" of
the state inside of our combinators, like oneOfM
.
For a full understanding of what's going on here, we need to see how the type
signature changes as we peel effects off the type, you can uncomment the
commented lines, and comment out the rest of the function, modifying the
let output =
line to follow along yourself, though the types you get may look
a little different due to idris evaluating type alaises like Eff
for you, I am
keeping them in aliased forms, and excluding Logger
, which doesn't impact the
semantic of parsing, to keep the examples concise:
-
runstate (unpack str) $ x
This produces a value with type
Eff [Except String, Choose] (List Char, List Char)
. In this tuple of values, the first element is the actual output of the parser, and the second element is the state after the parser has run. -
runExcept . runState (unpack str) $ x
This produces a value with type
Eff [Choose] (Either String (List Char, List Char))
. This corresponds to a computation that either returns our tuple of output and state from before, or an error. Important to note here is that, in the error case, we only return aString
(our error type), our state is discarded. -
runChoose {f = LazyList} . runExcept . runState (unpack str) $ x
The
Choose
effect, works with any type that implements theAlternative
interface, and the choice of type impacts the semantics. A full discussion of this is beyond the scope for today, but we chose to "run"Choose
withLazyList
, which effectively gives us an iterator over all the possible parsings of our input text.This produces a value with type
Eff [] (LazyList (Either String (List Char, List Char)))
. When we hit an application ofChoose
, likeoneOfM
, all possibilities will be tried and each one will be added to our outputLazyList
. Because this is aLazyList
, and not aList
, only values we actually inspect are generated, allowing parsing to terminate after the first successful parse without having to generate the rest of the list.Note that we have an independent possible state value for each slot in the list, this speaks to this effect stack, when run in this order, providing a sort of branching behavior for states, allowing different branches in the
Choose
effect to modify their state without impacting the state of other branches.
From there, we run our lazyRights
helper function over the outputs LazyList
to discard parsing paths that result in an error, extract the first element of
each tuple, get the head of the list, if one still exists, and use pack
to
convert the contents of the resulting Maybe (List Char)
to a string. Then a
little bit of debug logging, and return the output.
runParser : Has Logger fs => Parser (List Char) -> String
-> Eff fs $ Maybe String
runParser x str = do
info "Parsing: \{str}"
-- let outputs =
-- runChoose {f = LazyList} . runExcept . runState (unpack str) $ x
-- ?parser_types
outputs <-
lift . runChoose {f = LazyList} . runExcept . runState (unpack str) $ x
let output = map pack . head' . map fst . lazyRights $ outputs
case output of
Nothing =>
debug "Failed: \{show outputs}"
Just y => do
debug "Got: \{y}"
trace "\{show outputs}"
pure output
where
lazyRights : LazyList (Either a b) -> LazyList b
lazyRights [] = []
lazyRights (Left _ :: xs) = lazyRights xs
lazyRights (Right x :: xs) = x :: lazyRights xs
Escaping characters
This is much more boring the the parsing, we just do simple recursive pattern
matching on the characters of the provided string, escaping "
and \
, and
surround the resulting string with quotes.
escape : String -> String
escape str = "\"\{pack . escape' . unpack $ str}\""
where
escape' : List Char -> List Char
escape' [] = []
escape' ('"' :: xs) = '\\' :: '"' :: escape' xs
escape' ('\\' :: xs) = '\\' :: '\\' :: escape' xs
escape' (x :: xs) = x :: escape' xs
Part Functions
Part 1
Convert the inputs into a list of lines, traverse our parser over it, deal with
possible failures by sequencing the List (Maybe String)
into a
Maybe (List String)
, and the compute the difference in character counts.
part1 : Eff (PartEff String) (Int, List String)
part1 = do
inputs <- map lines $ askAt "input"
outputs <- traverse (runParser string) inputs
Just outputs <- pure $ sequence outputs
| _ => throw "Some strings failed to parse"
let difference =
sum $ zipWith (\x, y => strLength x - strLength y) inputs outputs
pure (difference, inputs)
Part 2
Map our character escaping function over our input string and compute the difference in character counts.
Make a little stop along the way to ensure that escape -> parse round trips without changing the content of the string.
part2 : List String -> Eff (PartEff String) Int
part2 inputs = do
let outputs = map escape inputs
Just reversed_outputs <- map sequence . traverse (runParser string) $ outputs
| _ => throw "Reversing escaping of the inputs failed"
unless (all id $ zipWith (==) inputs reversed_outputs) $ do
debug . delay . joinBy "\n" . map show . filter (\(x, y, z) => x /= z) $
zip3 inputs outputs reversed_outputs
throw "Parsed outputs were not identical to inputs"
let difference =
sum $ zipWith (\x, y => strLength y - strLength x) inputs outputs
pure difference