paint-brush
Applicative Parsing I: Building the Foundationby@james_32022
277 reads

Applicative Parsing I: Building the Foundation

by James BowenFebruary 10th, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

<a href="https://mmhaskell.com/blog/2018/2/5/parsing-primer-gherkin-syntax" target="_blank">Last week</a> we prepared ourselves for parsing by going over the basics of the Gherkin Syntax. In this article and the next, we’ll be using the <a href="https://hackage.haskell.org/package/regex-applicative-0.3.3/docs/Text-Regex-Applicative.html#t:RE" target="_blank">applicative parsing</a> library to parse that syntax. This week, we’ll focus on the fundamentals of this library, and build up a vocabulary of combinators to use. We’ll make heavy use of the <code class="markup--code markup--p-code">Applicative</code> typeclass. If you need a refresher on that, check out <a href="https://mmhaskell.com/blog/2017/2/6/applicatives-one-step-further" target="_blank">this article</a>. As we start coding, you can also follow along with the examples on <a href="https://github.com/jhb563/GherkinParsing" target="_blank">Github here</a>! Most of the code here is in <a href="https://github.com/jhb563/GherkinParsing/blob/master/src/Parser.hs" target="_blank">Parser.hs</a>.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Applicative Parsing I: Building the Foundation
James Bowen HackerNoon profile picture

Last week we prepared ourselves for parsing by going over the basics of the Gherkin Syntax. In this article and the next, we’ll be using the applicative parsing library to parse that syntax. This week, we’ll focus on the fundamentals of this library, and build up a vocabulary of combinators to use. We’ll make heavy use of the Applicative typeclass. If you need a refresher on that, check out this article. As we start coding, you can also follow along with the examples on Github here! Most of the code here is in Parser.hs.

In the coming weeks, we’ll be seeing a couple other parsing libraries as well. If you want to get some ideas about these and more, download our Production Checklist. It summarizes many other useful libraries for writing higher level Haskell.

If you’ve never started writing Haskell, now’s your chance! Get our free Beginner’s Checklist and learn the basics of getting started!

Getting Started

So to start parsing, let’s make some notes about our input format. First, we’ll treat our input feature document as a single string. We’ll remove all empty lines, and then trim leading and trailing whitespace from each line.

parseFeatureFromFile :: FilePath -> IO FeatureparseFeatureFromFile inputFile = do  fileContents <- lines <$> readFile inputFile  let nonEmptyLines = filter (not . isEmpty) fileContents  let trimmedLines = map trim nonEmptyLines  let finalString = unlines trimmedLines  case parseFeature finalString of    ...

…isEmpty :: String -> BoolisEmpty = all isSpace

trim :: String -> Stringtrim input = reverse flippedTrimmed  where    trimStart = dropWhile isSpace input    flipped = reverse trimStart    flippedTrimmed = dropWhile isSpace flipped

This means a few things for our syntax. First, we don’t care about indentation. Second, we ignore extra lines. This means our parsers might allow certain formats we don’t want. But that’s OK because we’re trying to keep things simple.

The RE Type

With applicative based parsing, the main data type we’ll be working with is called RE, for regular expression. This represents a parser, and it’s parameterized by two types:

data RE s a = ...

The s type refers to the fundamental unit we’ll be parsing. Since we're parsing our input as a single String, this will be Char. Then the a type is the result of the parsing element. This varies from parser to parser. The most basic combinator we can use is sym. This parses a single symbol of your choosing:

sym :: s - > RE s s

parseLowercaseA :: RE Char CharparseLowercaseA = sym ‘a’

To use an RE parser, we call the match function or its infix equivalent =~. These will return a Justvalue if we can match the entire input string, and Nothing otherwise:

>> match parseLowercaseA “a”Just ‘a’>> “b” =~ parseLowercaseANothing>> “ab” =~ parseLowercaseANothing -- (Needs to parse entire input)

Predicates and Strings

Naturally, we’ll want some more complicated functionality. Instead of parsing a single input character, we can parse any character that fits a particular predicate by using psym. So if we want to read any character that was not a newline, we could do:

parseNonNewline :: RE Char CharparseNonNewline = psym (/= ‘\n’)

The string combinator allows us to match a particular full string and then return it:

readFeatureWord :: RE Char StringreadFeatureWord = string “Feature”

We’ll use this for parsing keywords, though we’ll often end up discarding the “result”.

Applicative Combinators

Now the RE type is applicative. This means we can apply all kinds of applicative combinators over it. One of these is many, which allows us to apply a single parser several times. Here is one combinator that we’ll use a lot. It allows us to read everything up until a newline and return the resulting string:

readUntilEndOfLine :: RE Char StringreadUntilEndOfLine = many (psym (/= '\n'))

Beyond this, we’ll want to make use of the applicative <*> operator to combine different parsers. We can also apply a pure function (or constructor) on top of those by using <$>. Suppose we have a data type that stores two characters. Here’s how we can build a parser for it:

data TwoChars = TwoChars Char Char

parseTwoChars :: RE Char TwoCharsparseTwoChars = TwoChars <$> parseNonNewline <*> parseNonNewline

...

>> match parseTwoChars “ab”Just (TwoChars ‘a’ ‘b’)

We can also use <* and *>, which are cousins of the main applicative operator. The first one will parse but then ignore the right hand parse result. The second discards the left side result.

parseFirst :: RE Char CharparseFirst = parseNonNewline <* parseNonNewline

parseSecond :: RE Char CharparseSecond = parseNonNewline *> parseNonnewline

…

>> match parseFirst “ab”Just ‘a’>> match parseSecond “ab”Just ‘b’>> match parseFirst “a”Nothing

Notice the last one fails because the parser needs to have both inputs! We’ll come back to this idea of failure in a second. But now that we know this technique, we can write a couple other useful parsers:

readThroughEndOfLine :: RE Char StringreadThroughEndOfLine = readUntilEndOfLine <* sym '\n'

readThroughBar :: RE Char StringreadThroughBar = readUntilBar <* sym '|'

readUntilBar :: RE Char StringreadUntilBar = many (psym (\c -> c /= '|' && c /= '\n'))

The first will parse the rest of the line and then consume the newline character itself. The other parsers accomplish this same task, except with the vertical bar character. We’ll need these when we parse the Examples section next week.

Alternatives: Dealing with Parse Failures

We introduced the notion of a parser “failing” up above. Of course, we need to be able to offer alternatives when a parser fails! Otherwise our language will be very limited in its structure. Luckily, the RE type also implements Alternative. This means we can use the <|> operator to determine an alternative parser when one fails. Let’s see this in action:

parseFeatureTitle :: RE Char StringparseFeatureTitle = string “Feature: “ *> readThroughEndOfLine

parseScenarioTitle :: RE Char StringparseScenarioTitle = string “Scenario: “ *> readThroughEndOfLine

parseEither :: RE Char StringparseEither = parseFeatureTitle <|> parseScenarioTitle

…

>> match parseFeatureTitle “Feature: Login\n”Just “Login”>> match parseFeatureTitle “Scenario: Login\n”Nothing>> match parseEither “Scenario: Login\n”Just “Login”

Of course, if ALL the options fail, then we’ll still have a failing parser!

>> match parseEither “Random: Login\n”Nothing

We’ll need this to introduce some level of choice into our parsing system. For instance, it’s up to the user if they want to include a Background as part of their feature. So we need to be able to read the background if it’s there or else move onto parsing a scenario.

Cconclusion

That wraps up our introduction to the basic combinators of applicative parsing. Next week, we’ll take all the pieces we’ve developed here and put them to work on Gherkin syntax itself. Everything seems pretty small so far. But we’ll see that we can actually build up our results very rapidly once we have the basic pieces in place!

If you want to see some more libraries that are useful for important Haskell tasks, take a look at our Production Checklist. It will introduce you to some libraries for parsing, databases, APIs, and much more!

If you’re new to Haskell, there’s no better time to start! Download our free Beginners Checklist! It will help you download the right tools and start learning the language.