In this part of creating your programming language, we will implement loops.
Please see the previous parts:
The full source code is available over on GitHub.
In the first section, we will cover lexical analysis. It’s a process to divide the source code into language lexemes, such as keyword, identifier/variable, operator, etc. To make our lexical analysis work with loop constructions, we add the following Keyword
lexemes: loop
, in
, by
to the TokenType enum:
package org.example.toylanguage.token;
...
public enum TokenType {
...
Keyword("(if|elif|else|then|end|print|input|struct|fun|return|loop|in|by)(?=\\s|$)"),
...
}
Then, in order to separate the lower and upper bounds of the For loop range, we add the two-dot GroupDivider
lexeme: [.]{2}
...
public enum TokenType {
...
GroupDivider("(\\[|\\]|\\,|\\{|}|[.]{2})"),
...
}
And finally, if we don’t want the Numeric
lexeme to catch the two-dot GroupDivider
lexeme declared after the loop’s lower bound, we add a negative look ahead construction before the dot expression: (?![.]{2})
...
public enum TokenType {
...
Numeric("([-]?(?=[.]?[0-9])[0-9]*(?![.]{2})[.]?[0-9]*)"),
...
}
Let’s move to the next section - the syntax analysis, which will convert the lexemes received from the previous step into the final statements following our language rules. As was said at the beginning, our language will support three types of loops: For, While and Iterable. Therefore, first we will declare the AbstractLoopStatement
class, which will contain common execution process for all the loop types:
package org.example.toylanguage.statement.loop;
public abstract class AbstractLoopStatement extends CompositeStatement {
protected abstract void init();
protected abstract boolean hasNext();
protected abstract void preIncrement();
protected abstract void postIncrement();
}
Inside this class, we declare the following abstract methods, which can have different implementation for each loop type:
init()
- it will be invoked to perform initialization operations, like initializing a counter variable when a loop starts the execution.hasNext()
- It will be called before each iteration to define that we still have remaining items to iterate.preIncrement()
- It’s a prefix operation that will be called before each iteration.postIncrement()
- It’s a postfix operation that will be called at the end of each iteration.
Before creating the loop implementations, we complete the Statement#execute()
method by utilizing declared abstract methods. To start the execution, we invoke the init()
method to initialize the loop state, but before that, we should open the new MemoryScope for the variables declared within the loop, as they shouldn’t be accessed after the loop ends. In the end of the execution, we release the earlier opened MemoryScope:
MemoryContext.newScope();
try {
init();
} finally {
MemoryContext.endScope();
}
Next, to iterate a loop, we utilize the abstract hasNext()
method. Inside each iteration, we invoke the loop’s inner statements, defined within the inherited CompositeStatement class. When the iteration starts, we invoke the preIncrement()
method to perform the prefix increment operations. At the end of each iteration, we call the postIncrement()
method to perform the postfix increment operations accordingly:
while (hasNext()) {
preIncrement();
try {
// execute inner statements
for (Statement statement : getStatements2Execute()) {
statement.execute();
}
} finally {
postIncrement();
}
}
Next, to make sure that variables declared inside the loop statements are isolated from each iteration, we open the inner MemoryScope during the statements' execution. And finally, if our loop is defined inside the function and one of the loop’s inner statements contains a return statement, we should stop the iteration immediately because it’s a sign to exit a function. Therefore, we validate that the function's return statement hasn’t been called after each statement execution. The complete AbstractLoopStatement#execute()
method:
public abstract class AbstractLoopStatement extends CompositeStatement {
...
@Override
public void execute() {
MemoryContext.newScope();
try {
init();
while (hasNext()) {
preIncrement();
MemoryContext.newScope();
try {
// execute inner statements
for (Statement statement : getStatements2Execute()) {
statement.execute();
if (ReturnContext.getScope().isInvoked())
return;
}
} finally {
MemoryContext.endScope(); // release each iteration memory
postIncrement();
}
}
} finally {
MemoryContext.endScope(); // release loop memory
}
}
}
Let’s begin with the first For loop implementation. By our language design, it will contain the variable expression, the lower bound, the upper bound, and the step expression:
package org.example.toylanguage.statement.loop;
@RequiredArgsConstructor
public class ForLoopStatement extends AbstractLoopStatement {
private final VariableExpression variable;
private final Expression lowerBound;
private final Expression uppedBound;
private final Expression step;
}
Next, we implement the abstract methods. Within the first init()
method, we assign the lower bound to the variable by evaluating an instance of AssignmentOperator, which accepts the variable
expression as a first operand and the lowerBound
expression as a second operand:
...
public class ForLoopStatement extends AbstractLoopStatement {
...
@Override
protected void init() {
new AssignmentOperator(variable, lowerBound)
.evaluate();
}
}
Next, we override the hasNext()
. It should return a boolean value if we remaining have items. To make a compare operation, we create an instance of the LessThenOperator by passing the variable
expression as a first operand, and the upperBound
as a second operand. As a result, we expect to get the LogicalValue with boolean value inside it:
...
public class ForLoopStatement extends AbstractLoopStatement {
...
@Override
protected boolean hasNext() {
LessThanOperator hasNext = new LessThanOperator(variable, uppedBound);
Value<?> value = hasNext.evaluate();
return value instanceof LogicalValue && ((LogicalValue) value).getValue();
}
}
Lastly, to perform the increment operations for each iteration, we need to implement the preIncrement()
and postIncrement()
methods. The For loop should use the step increment only at the end of each iteration. Therefore, we perform the increment only within the postIncrement()
method. To implement the increment operation, we create an instance of the AdditionOperator, which will sum up the variable
value with the step
value, and assign the result to the variable
expression again using an instance of the AssignmentOperator. If the step expression hasn’t been provided, we accumulate the NumericValue instance with the default value equal to 1.0
:
...
public class ForLoopStatement extends AbstractLoopStatement {
...
@Override
protected void preIncrement() {
}
@Override
protected void postIncrement() {
AdditionOperator stepOperator = new AdditionOperator(variable, Objects.requireNonNullElseGet(step, () -> new NumericValue(1.0)));
new AssignmentOperator(variable, stepOperator)
.evaluate();
}
}
The next loop implementation is the While loop. It’s more straightforward than the For loop, as we don’t need to perform addition and assignment operations at all. This operator will contain only the hasNext
expression, which will only serve to calculate the hasNext()
result. The other overridden methods will be empty:
package org.example.toylanguage.statement.loop;
@RequiredArgsConstructor
public class WhileLoopStatement extends AbstractLoopStatement {
private final Expression hasNext;
@Override
protected void init() {
}
@Override
protected boolean hasNext() {
Value<?> value = hasNext.evaluate();
return value instanceof LogicalValue && ((LogicalValue) value).getValue();
}
@Override
protected void preIncrement() {
}
@Override
protected void postIncrement() {
}
}
The last loop implementation is the Iterable loop. It will allow us to perform the for-each iterations with arrays and structures.
First, we define the new Value implementation, which will override the java.util.Iterable
interface and, accordingly, will provide us with the java.util.Iterator
implementation:
package org.example.toylanguage.expression.value;
public abstract class IterableValue<T> extends Value<T> implements Iterable<Value<?>> {
public IterableValue(T value) {
super(value);
}
}
Next, we extend the ArrayValue by implementing IterableValue and passing array values to the Iterator instance:
package org.example.toylanguage.expression.value;
public class ArrayValue extends IterableValue<List<Value<?>>> {
...
@Override
public Iterator<Value<?>> iterator() {
return getValue().iterator();
}
}
And we do the same extension for StructureValue as well. For now, we will iterate only the structure’s values. Later, we can introduce the dictionary data type to operate with key-value types:
package org.example.toylanguage.expression.value;
public class StructureValue extends IterableValue<Map<String, Value<?>>> {
...
@Override
public Iterator<Value<?>> iterator() {
return getValue().values().iterator();
}
}
Now we have the two IterableValue implementations which can provide us with the Iterator instance, we can complete the Iterable loop with the corresponding implementation. It will contain the variable
expression and the iterable
expression. Also, we declare an auxiliary non-final Iterator<Value<?>
field that will be used to iterate the loop:
package org.example.toylanguage.expression.value;
@RequiredArgsConstructor
public class IterableLoopStatement extends AbstractLoopStatement {
private final VariableExpression variable;
private final Expression iterableExpression;
private Iterator<Value<?>> iterator;
}
Next, we override and resolve the abstract methods. Inside the init()
method, we initialize the iterator
field using the IterableValue’s iterator()
method:
...
public class IterableLoopStatement extends AbstractLoopStatement {
...
@Override
protected void init() {
Value<?> value = iterableExpression.evaluate();
if (!(value instanceof IterableValue))
throw new ExecutionException(String.format("Unable to loop non IterableValue `%s`", value));
this.iterator = ((IterableValue<?>) value).iterator();
}
}
The hasNext()
method will utilize the Iterator#hasNext()
implementation. Lastly, we should perform the prefix increment operation to know the value that is currently being iterated. Within preIncrement()
, we create an instance of the AssignmentOperator, which will accept the variable
as a first operand, and the value received from the iterator as a second operand:
...
public class IterableLoopStatement extends AbstractLoopStatement {
...
@Override
protected boolean hasNext() {
return iterator.hasNext();
}
@Override
protected void preIncrement() {
Value<?> next = iterator.next();
new AssignmentOperator(variable, next)
.evaluate();
}
@Override
protected void postIncrement() {
}
}
To make the loop statements work, we need to map the loop lexemes to the defined statements using the StatementParser class. All the loops defined by the language grammar start with the loop
keyword. Therefore, inside the parseExpression() method, we add a new loop
case for the corresponding Keyword lexeme:
package org.example.toylanguage;
public class StatementParser {
...
private Statement parseExpression() {
Token token = tokens.next(TokenType.Keyword, TokenType.Variable, TokenType.Operator);
switch (token.getType()) {
case Variable:
case Operator:
...
case Keyword:
switch (token.getValue()) {
case "print":
...
case "input":
...
case "if":
...
case "struct":
...
case "fun":
...
case "return":
...
case "loop":
...
}
default:
...
}
}
}
After the loop
Keyword, we read the next expression using the Dijkstra Two-Stack algorithm defined within the ExpressionReader class. If we read the For or the Iterable loop, we expect this expression to be only a variable. If we read the While loop, this expression will act as a loop condition, which can be an instance of any Expression implementation, including operator or function invocation. Therefore, we split the execution. In case the expression is the VariableExpression instance and the next lexeme is the in
Keyword, we continue to read the For and Iterable loop. In the negative case, we build the WhileLoopStatement using the condition expression:
case "loop": {
Expression loopExpression = new ExpressionReader().readExpression();
AbstractLoopStatement loopStatement;
if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {
// loop <variable> in <bounds>
} else {
// loop <condition>
loopStatement = new WhileLoopStatement(loopExpression);
}
Inside the For and Iterable loop clause, we skip the next in
Keyword lexeme, and then read the following bounds expression. This expression can be an instance of any Expression implementation which can be evaluated to the lower bound value or IterableValue only when the code will be executed. To define which loop type it is, we can rely on the next lexeme. In the case of the For loop, after the lower bound expression, we expect to read the two-dot GroupDivider as a spliterator for lower and upper bounds. In the negative condition, we build the IterableLoopStatement:
if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {
// loop <variable> in <bounds>
VariableExpression variable = (VariableExpression) loopExpression;
tokens.next(TokenType.Keyword, "in");
Expression bounds = new ExpressionReader().readExpression();
if (tokens.peek(TokenType.GroupDivider, "..")) {
// loop <variable> in <lower_bound>..<upper_bound>
} else {
// loop <variable> in <iterable>
loopStatement = new IterableLoopStatement(variable, bounds);
}
}
To build the last For loop statement, we skip the two-dot GroupDivider and read the upper bound expression. In our language grammar, for the For loop, we can define the step expression with the by
keyword. Therefore, if the next expression is the by
lexeme, we read the following step expression. At the end, we build an instance of the ForLoopStatement:
if (tokens.peek(TokenType.GroupDivider, "..")) {
// loop <variable> in <lower_bound>..<upper_bound>
tokens.next(TokenType.GroupDivider, "..");
Expression upperBound = new ExpressionReader().readExpression();
Expression step = null;
if (tokens.peek(TokenType.Keyword, "by")) {
// loop <variable> in <lower_bound>..<upper_bound> by <step>
tokens.next(TokenType.Keyword, "by");
step = new ExpressionReader().readExpression();
}
loopStatement = new ForLoopStatement(variable, bounds, upperBound, step);
}
We have read every loop type declaration. Finally, we can read the body of the loop. To gather all internal statements, we invoke the recursive StatementParser#parseExpression()
method, which will return an instance of the Statement interface. We should stop the process when we reach the finalizing end
lexeme:
while (!tokens.peek(TokenType.Keyword, "end")) {
Statement statement = parseExpression();
loopStatement.addStatement(statement);
}
At the end, we skip this end
keyword and return the loopStatement. The complete loop
clause:
case "loop": {
Expression loopExpression = new ExpressionReader().readExpression();
AbstractLoopStatement loopStatement;
if (loopExpression instanceof VariableExpression && tokens.peek(TokenType.Keyword, "in")) {
// loop <variable> in <bounds>
VariableExpression variable = (VariableExpression) loopExpression;
tokens.next(TokenType.Keyword, "in");
Expression bounds = new ExpressionReader().readExpression();
if (tokens.peek(TokenType.GroupDivider, "..")) {
// loop <variable> in <lower_bound>..<upper_bound>
tokens.next(TokenType.GroupDivider, "..");
Expression upperBound = new ExpressionReader().readExpression();
Expression step = null;
if (tokens.peek(TokenType.Keyword, "by")) {
// loop <variable> in <lower_bound>..<upper_bound> by <step>
tokens.next(TokenType.Keyword, "by");
step = new ExpressionReader().readExpression();
}
loopStatement = new ForLoopStatement(variable, bounds, upperBound, step);
} else {
// loop <variable> in <iterable>
loopStatement = new IterableLoopStatement(variable, bounds);
}
} else {
// loop <condition>
loopStatement = new WhileLoopStatement(loopExpression);
}
while (!tokens.peek(TokenType.Keyword, "end")) {
Statement statement = parseExpression();
loopStatement.addStatement(statement);
}
tokens.next(TokenType.Keyword, "end");
return loopStatement;
}
We have finished embedding loops in both lexical and syntax analyzers. Now we should be able to perform For, While and Iterable loops. Let’s create and test a more real task - the bubble sort algorithm:
fun bubble_sort [ arr, n ]
loop i in 0..n - 1
loop j in 0..n - i - 1
if arr{j+1} < arr{j} then
temp = arr{j}
arr{j} = arr{j + 1}
arr{j + 1} = temp
end
end
end
end
fun is_sorted [ arr, n ]
loop i in 0..n - 1
if arr{i} > arr{i+1} then
return false
end
end
return true
end
arr = {}
arr_len = 20
loop i in 0..arr_len by 1
arr << i * 117 % 17 - 1
end
print arr # [-1, 14, 12, 10, 8, 6, 4, 2, 0, 15, 13, 11, 9, 7, 5, 3, 1, -1, 14, 12]
print is_sorted [ arr, arr_len ] # false
bubble_sort [ arr, arr_len ]
print arr # [-1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 12, 13, 14, 14, 15]
print is_sorted [ arr, arr_len ] # true