Sylver is a language agnostic platform for building custom source code analyzers (think eslint for every language). This might be a lot to unpack, so let us explore this tool by solving a real-world problem: our application's configuration is stored in complex JSON documents, and we'd like to build a tool to automatically validate these documents against our business rules.
In this series of tutorials, we'll go from having zero knowledge of Sylver or static analysis to building a fully-fledged linter for our configuration files. We will use JSON as an example, but the tools and techniques presented apply to many data formats and even complete programming languages!
Also, note that while we will be building everything from scratch using Sylver's domain-specific languages (DSL), a catalog of built-in specifications for the most common languages will be included in future releases of the tool.
Sylver is distributed as a single static binary. Installing it is as simple as:
sylver
binary to a location in your $PATH
Let us create a blank workspace for this tutorial!We'll start by creating a new folder and a json.syl
file to write the Sylver specification for the JSON language.
mkdir sylver_getting_started
cd sylver_getting_started
touch json.syl
We will also store our test JSON file in config.json
.
{
"variables": [
{
"name": "country",
"description": "Customer's country of residence",
"values": ["us", "fr", "it"]
},
{
"name": "age",
"description": "Cusomer's age",
"type": "number"
}
]
}
This file specifies the variables used to describe customer profiles in a fictional customer database.
Variables are assigned a type or a list of potential values.
Sylver parses raw text into typed structures called parse trees. The first step in defining a language spec is to define a set of types for our tree nodes.
We'll add the following node types declarations to our json.syl
spec:
node JsonNode { }
node Null: JsonNode { }
node Bool: JsonNode { }
node Number: JsonNode { }
node String: JsonNode { }
node Array: JsonNode {
elems: List<JsonNode>
}
node Object: JsonNode {
members: List<Member>
}
node Member: JsonNode {
key: String,
value: JsonNode
}
These declarations resemble object-type declarations in many mainstream languages. The :
syntax denotes inheritance.
Now that we have a set of types to describe JSON documents, we need to specify how to build a parse tree from a sequence of characters. This process is done in two steps:
Tokens are described using declarations of the form term NAME = <term_content>
where <term_content>
is either a literal surrounded by single-quotes ('
) or a regex between backticks (`
). The regexes use a syntax similar to Perl-style regular expressions. Characters in the input string that match one of the terminal literals or regexes will be grouped into a token of the given name.
term COMMA = ','
term COLON = ':'
term L_BRACE = '{'
term R_BRACE = '}'
term L_BRACKET = '['
term R_BRACKET = ']'
term NULL = 'null'
term BOOL_LIT = `true|false`
term NUMBER_LIT = `\-?(0|([1-9][0-9]*))(.[0-9]+)?((e|E)(\+|-)?[0-9]+)?`
term STRING_LIT = `"([^"\\]|(\\[\\/bnfrt"])|(\\u[a-fA-F0-9]{4}))*"`
ignore term WHITESPACE = `\s`
Term rules for numbers and strings are slightly involved in accounting for some of JSON's peculiarities.
Note that the WHITESPACE
term declaration (matching a single whitespace character) is prefixed with the ignore
keyword. This means that WHITESPACE
tokens do not affect the structure of the document and can be ignored during syntactic analysis.
In this last part of the language spec, we write rules describing how tree nodes are built by matching tokens from the input stream.
For example, a rule specifying: "if the current token is a STRING_LIT, build a String node" can be written as follows:
rule string = String { STRING_LIT }
Rules can refer to other rules to construct nested nodes.
For example, here is a rule specifying that a Member
node (corresponding to an object member
in JSON) can be built by building a node using the string
rule and then matching a COLON
token followed by any valid JSON value:
rule member = Member { key@string COLON value@main }
Nested nodes are associated with a field using the @
syntax.
The main
rule is the entry point for the parser, so in our case, it designates any valid JSON value.
A valid JSON document can be made of a 'null' literal, a number, a boolean value, a string, an array of JSON values, or a JSON object, which is reflected in the main rule:
rule main =
Null { NULL }
| Number { NUMBER_LIT }
| Bool { BOOL_LIT }
| string
| Array { L_BRACKET elems@sepBy(COMMA, main) R_BRACKET }
| Object { L_BRACE members@sepBy(COMMA, member) R_BRACE }
The sepBy(TOKEN, rule_name)
syntax is used to parse nodes using the main
rule, while matching a TOKEN
token between every parsed node.
We now have a complete language spec for the JSON language:
node JsonNode { }
node Null: JsonNode { }
node Bool: JsonNode { }
node Number: JsonNode { }
node String: JsonNode { }
node Array: JsonNode {
elems: List<JsonNode>
}
node Object: JsonNode {
members: List<Member>
}
node Member: JsonNode {
key: String,
value: JsonNode
}
term COMMA = ','
term COLON = ':'
term L_BRACE = '{'
term R_BRACE = '}'
term L_BRACKET = '['
term R_BRACKET = ']'
term NULL = 'null'
term BOOL_LIT = `true|false`
term NUMBER_LIT = `\-?(0|([1-9][0-9]*))(.[0-9]+)?((e|E)(\+|-)?[0-9]+)?`
term STRING_LIT = `"([^"\\]|(\\[\\/bnfrt"])|(\\u[a-fA-F0-9]{4}))*"`
ignore term WHITESPACE = `\s`
rule string = String { STRING_LIT }
rule member = Member { key@string COLON value@main }
rule main =
Null { NULL }
| Number { NUMBER_LIT }
| Bool { BOOL_LIT }
| string
| Array { L_BRACKET elems@sepBy(COMMA, main) R_BRACKET }
| Object { L_BRACE members@sepBy(COMMA, member) R_BRACE }
The last step is to test it on our test file!
This is done by invoking the following command:
sylver parse --spec=json.syl --file=config.json
Which yields the following parse tree:
Object {
. ● members: List<Member> {
. . Member {
. . . ● key: String { "variables" }
. . . ● value: Array {
. . . . ● elems: List<JsonNode> {
. . . . . Object {
. . . . . . ● members: List<Member> {
. . . . . . . Member {
. . . . . . . . ● key: String { "name" }
. . . . . . . . ● value: String { "country" }
. . . . . . . }
. . . . . . . Member {
. . . . . . . . ● key: String { "description" }
. . . . . . . . ● value: String { "Customer's country of residence" }
. . . . . . . }
. . . . . . . Member {
. . . . . . . . ● key: String { "values" }
. . . . . . . . ● value: Array {
. . . . . . . . . ● elems: List<JsonNode> {
. . . . . . . . . . String { "us" }
. . . . . . . . . . String { "fr" }
. . . . . . . . . . String { "it" }
. . . . . . . . . }
. . . . . . . . }
. . . . . . . }
. . . . . . }
. . . . . }
. . . . . Object {
. . . . . . ● members: List<Member> {
. . . . . . . Member {
. . . . . . . . ● key: String { "name" }
. . . . . . . . ● value: String { "age" }
. . . . . . . }
. . . . . . . Member {
. . . . . . . . ● key: String { "description" }
. . . . . . . . ● value: String { "Customer's age" }
. . . . . . . }
. . . . . . . Member {
. . . . . . . . ● key: String { "type" }
. . . . . . . . ● value: String { "number" }
. . . . . . . }
. . . . . . }
. . . . . }
. . . . }
. . . }
. . }
. }
}
In the next part, we'll define business rules to validate our JSON configuration (for example, the possible values for each variable must be of the same type), and we will use a query DSL to identify the tree nodes that violate these rules.
Also Published Here