Scripting languages allow dynamic features easily. Vassili Kaplan writes his own in C++ allowing keywords in any human language.
To iterate is human, to recurse divine.
~ L. Peter Deutsch
Why is yet another scripting language needed? Not only are there already quite a few scripting languages present but there are also a few frameworks to build new languages and parsers, notably ANTLR Framework [
ANTLR
] and the Boost Spirit library [
Spirit
]. Nevertheless, I see two main advantages in writing a language like the one presented in this article: the possibility of adding a new function, a control flow statement or a new data structure on the fly and to have keywords translated to any language with just a few configuration changes (or to add synonyms to existing keywords, so that, for instance,
ls
and
dir
could be used interchangeably). Also, as you will see from this article, the learning curve is much shorter to get started adding your own code in C++.
In C Vu [ Kaplan15a , Kaplan15b ] I published the split-and-merge algorithm to parse a string containing a mathematical expression. Later, in MSDN Magazine [ Kaplan15c , Kaplan16 ] I implemented this algorithm in C# and showed how one can implement a scripting language in C# based on the split-and-merge algorithm.
In this article, I am going to extend the split-and-merge algorithm presented earlier and show how one can use it to write a scripting language in C++. In MSDN Magazine [ Kaplan16 ] the scripting language was called CSCS (Customized Scripting in C#). For simplicity, I will continue calling the language I discuss here CSCS (Customized Scripting in C++ using the split-and-merge algorithm) as well.
The CSCS language, as described in MSDN Magazine [ Kaplan16 ], was still not very mature. In particular, there is a section towards the end of the article mentioning some of the important features usually present in a programming language that CSCS was still missing. In this article I’ll generalize the CSCS language and show how to implement most of those missing features and a few others in C++.
All the code discussed here is available for download [ Downloads ].
The split-and-merge algorithm to parse a language statement
Here we’ll generalize the split-and-merge algorithm to parse not only a mathematical expression but any CSCS language statement. A separation character must separate all CSCS statements. It is defined in the
Constants.h
file as
const char END_STATEMENT = ';'
.
The algorithm consists of two steps.
In the first step, we split a given string into a list of objects, called ‘Variables’. Each
Variable
consists of an intermediate result (a number, a string, or an array of other
Variable
s) and an ‘action’ that must be applied to this
Variable
. Previously we called this
Variable
a ‘Cell’ and it could hold only a numerical value.
The last element of the created list of
Variable
s has so called ‘null action’, which, for convenience, we denote by the character
")"
. It has the lowest priority of 0.
For numbers, an action can be any of
+
,
-
,
*
,
/
,
%
,
&&
,
||
, and so on. For strings, only
+
(concatenation), and logical operators
<
,
<=
,
>
,
>=
,
==
,
!=
are defined.
Listing 1 contains an excerpt from the
Variable
class definition.
class Variable { public: Variable(): type(Constants::NONE) {} Variable(double val): numValue(val), type(Constants::NUMBER) {} Variable(string str): strValue(str), type(Constants::STRING) {} string toString() const; bool canMergeWith(const Variable& right); void merge(const Variable& right); void mergeNumbers(const Variable& right); void mergeStrings(const Variable& right); private: double numValue; string strValue; vector<Variable> tuple; string action; string varname; Constants::Type type; }; |
Listing 1 |
The separation criteria for splitting a string into a list of
Variable
s are: an action, an expression in parentheses, or a function (including a variable, which is also treated as a function), previously registered with the parser. We are going to talk how to register a function with the parser in the next section. In Listing 2 you can see all of the actions defined for numbers and their priorities.
unordered_map<string, int> prio; prio["++"] = 10; prio["--"] = 10; prio["^"] = 9; prio["%"] = 8; prio["*"] = 8; prio["/"] = 8; prio["+"] = 7; prio["-"] = 7; prio["<"] = 6; prio[">"] = 6; prio["<="] = 6; prio[">="] = 6; prio["=="] = 5; prio["!="] = 5; prio["&&"] = 4; prio["||"] = 3; prio["+="] = 2; prio["-="] = 2; prio["*="] = 2; prio["/="] = 2; prio["%="] = 2; prio["="] = 2; |
Listing 2 |
In the case of an expression in parentheses or a function, we apply recursively the whole split-and-merge algorithm to that expression in parentheses or the function argument in order to get a
Variable
object as a result. At the end of the first step we are going to have a list of
Variable
s, each one having an action to be applied to the next
Variable
in the list. Thanks to the recursion there will be no functions left after step 1, just numbers, strings or lists of numbers, strings or lists of numbers, strings, … and so on recursively. We call these lists ‘tuples’. Internally they are implemented as vectors (see Listing 1).
The data structure holding the string with the expression to parse and a pointer to the character currently being parsed is
ParsingScript
. An excerpt from its definition is shown in Listing 3.
class ParsingScript { public: ParsingScript(const string& d): data(d), from(0) {} inline char operator()(size_t i) const { return data[i]; } inline size_t size() const { return data.size(); } inline bool stillValid() const { return from < data.size(); } inline size_t getPointer() const { return from; } inline char current() const { return data[from]; } inline char currentAndForward() { return data[from++]; } inline char tryNext() const { return from+1 < data.size() ? data[from+1] : Constants::NULL_CHAR; } inline void forward(size_t delta = 1) { from += delta; } private: string data; // contains the whole script size_t from; // a pointer to the // script data above }; |
Listing 3 |
The main parsing cycle of the first part of the algorithm is shown in Listing 4.
do // Main processing cycle of the first part. { char ch = script.currentAndForward (); // get the next character and move one forward bool keepCollecting = stillCollecting(script, parsingItem, to, action); if (keepCollecting) { // The char still belongs to the // previous operand. parsingItem += ch; // Parse until the next extracted character is // in the "to" string. bool goForMore = script.stillValid() && !Utils::contains(to, script.current())); if (goForMore) { continue; } } // Done getting the next token. The getValue() // call below may recursively call this method if // extracted item is a function or is starting // with a '('. ParserFunction func(script, parsingItem, ch, action); Variable current = func.getValue(script); if (action.empty()) { // find the next "action" // token or a null action '(' action = updateAction(script, to); } char next = script.current(); // we've already // moved forward bool done = listToMerge.empty() && (next == END_STATEMENT || (action == Constants::NULL_ACTION && current.getType() != NUMBER); if (done) { // If there is no numerical result, we are not // in a math expression. listToMerge.push_back(current); return listToMerge; } current.action = action; listToMerge.push_back(current); parsingItem.clear(); } while (script.stillValid() && !Utils::contains(to, script.current()); |
Listing 4 |
The second step consists in merging the list of
Variable
s created in the first step according to their priorities. Two
Variable
objects can be merged together only if the priority of the action of the
Variable
on the left is greater or equal than the priority of the action of the
Variable
on the right. If not, we jump to the
Variable
on the right and merge it with the
Variable
on its right first, and so on, recursively. As soon as the
Variable
on the right has been merged with the
Variable
next to it, we return back to the original
Variable
, the one we weren’t able to merge before, and try to remerge it with the newly created
Variable
on its right. Note that eventually we will be able to merge the list since the last
Variable
in this list has a null action with zero priority.
Check out the implementation of the second step of the algorithm in Listing 5. The function
merge()
is called from outside with the
mergeOneOnly
parameter set to
false
. That parameter is set to
true
when the function calls itself recursively, because we need to merge the
Variable
on the right with the
Variable
on its right just once, and return back in order to remerge the current
Variable
.
Variable Parser::merge(Variable& current, size_t& index, vector<Variable>& listToMerge, bool mergeOneOnly) { while (index < listToMerge.size()) { Variable& next = listToMerge[index++]; while (!current.canMergeWith(next)) { merge(next, index, listToMerge, true/*mergeOneOnly*/); } current.merge(next); if (mergeOneOnly) { break; } } return current; } |
Listing 5 |
Let’s see an example of applying the split-and-merge algorithm to the following CSCS language statements:
a = "blah"; b = 10; c = a == "blue" || b == 1; print(c);
We have four statements above, separated by the
;
character.
The parsing of the first two statements is analogous – as soon as the parser gets the
=
token, it will register the parameters on the left (
a
and
b
), mapping them to their corresponding values (
"blah"
and
10
). We will see how to register variables and functions with the parser in the next section (actually for the parser it doesn’t matter whether it is a variable or a function – they are all treated the same). Note that there is no need to declare any variables – they are all deduced from the context.
More interesting is the third statement:
c = a == "blue" || b == 1;
When parsing the right part of this statement (after the assignment), as soon as the parser gets a function or a variable (this can be anything that is not in quotes and is not a number) it will try to find if there is already a corresponding variable or a function registered with the parser. If not, a
ParsingException
will be thrown, but in our case we have already defined
a
and
b
in the previous two statements, so their values will be substituted, transforming the right side of the statement above to:
"blah" == "blue" || 10 == 1;
The first step of the split-and-merge algorithm produces the following list of
Variable
s from the above statements:
Split("blah" == "blue" || 10 == 1) → 1. Variable(strValue = "blah", action = "==") 2. Variable(strValue = "blue", action = "||") 3. Variable(numValue = 10, action = "==") 4. Variable(numValue = 1, action = ")")
In the second part of the algorithm we merge the list of
Variable
s above. The
Variable
s 1 and 2 can be merged on the first pass since the priority of the
"=="
action is higher than the priority of the
"||"
action according to the Listing 2. The merging of
Variable
s 1 and 2 consists in applying the action
"=="
of the first variable to the values of both variables, leading to a new
Variable
having the action of the
Variable
on its right:
Merge(Variable(strValue = "blah", action = "=="), Variable(strValue = "blue", action = "||")) = Variable("blah" == "blue", action = "||") = Variable(numValue = 0, action = "||")
Next, we merge
Variable(numValue = 0, action = "||")
with the next
Variable
in the list,
Variable(numValue = 10, action = "==")
. But now the action
= "||"
has a lower priority than the action
= "=="
, therefore we need to merge first the
Variable(numValue = 10, action = "==")
with the next
Variable(numValue = 1, action = ")")
. Since the null action
")"
has the lowest priority, the merge can be done, producing a new
Variable
:
Merge(Variable(numValue = 10, action = "=="),Variable(numValue = 1, action = ")")) = Variable(10 == 1, action = ")") = Variable(numValue = 0, action = ")")
Since this merge was successful we must now get back to the
Variable(numValue = 0, action = "||")
and remerge it with the newly created
Variable(numValue = 0, action = ")")
, producing:
Merge(Variable(numValue = 0, action = "||"),Variable(numValue = 0, action = ")")) = Variable(0 || 0, action = ")") = Variable(numValue = 0, action = ")")
Since there are no more
Variable
s left in the list, the result of merging is 0. This value will be assigned to the variable
"c"
and registered with the parser.
The last statement is
"print(c);"
. After extracting the token
print
the parser will look if there is a function named
print
already registered with the parser. Since there is one, the parser will recursively call the whole split-and-merge algorithm on the argument of the
print()
function,
"c"
. Since
"c"
was registered with the parser in the previous step, the parser will return back its value, 0, to the print function. Let’s see more closely how variables and functions can be implemented and registered with the parser taking as an example the
print()
function.
Registering functions and variables with the parser
All the functions that can be added to the parser must derive from the
ParserFunction
class.
The
Identity
is a special function which is called when we have an argument in parentheses. It just calls the main entry method of the split-and-merge algorithm to load the whole expression in parentheses:
class IdentityFunction : public ParserFunction { public: virtual Variable evaluate(ParsingScript& script) { return Parser::loadAndCalculate(script); } };
All split-and-merge functions and variables are implemented similarly.
There are three basic steps to register a function with the parser:
-
Define a function keyword token, i.e. the name of the function in the scripting language, CSCS, e.g.:
static const string PRINT; // in Constants.h const string Constants::PRINT = "print"; // in Constants.cpp
-
Implement the class to be mapped to the keyword from the previous step. Basically just the
evaluate()
method must be overridden. E.g. for theprint()
function:Variable PrintFunction::evaluate(ParsingScript& script) { vector<Variable> args = Utils::getArgs(script, Constants::START_ARG, Constants::END_ARG); for (size_t i = 0; i < args.size(); i++) { cout << args[i].toString(); } cout << endl; return Variable::emptyInstance; }
-
Map an object of the class implemented in the previous step with the previously defined keyword as follows:
ParserFunction::addGlobalFunction (Constants::PRINT, new PrintFunction());
Utils::getArgs()
auxiliary function parses the arguments of the print function and first gets a list of strings, separated by commas from
print(string1, string2, ..., stringN)
. Then it recursively applies the split-and-merge algorithm to each of these
string1, string2, ..., stringN
. Finally, it prints them out to the terminal.
The
addGlobalFunction()
method (see Listing 6) just adds a new entry to the global dictionary
s_functions
(implemented as an
unordered_map
) used by the parser to map keywords to functions.
Variable IfStatement::evaluate(ParsingScript& script) { size_t startIfCondition = script.getPointer(); Variable result = Parser::loadAndCalculate (script, Constants::END_ARG_STR); bool isTrue = result.numValue != 0; if (isTrue) { result = processBlock(script); if (result.type == Constants::BREAK_STATEMENT || result.type == Constants::CONTINUE_STATEMENT) { // Got here from the middle of the if-block. // Skip it. script.setPointer(startIfCondition); skipBlock(script); } skipRestBlocks(script); return result; } // We are in Else. Skip everything in the // If statement. skipBlock(script); ParsingScript nextData(script.getData(), script.getPointer()); string nextToken = Utils::getNextToken(nextData); if (Constants::ELSE_IF_LIST.find(nextToken) != Constants::ELSE_IF_LIST.end()) { script.setPointer(nextData.getPointer() + 1); result = processIf(script); } if (Constants::ELSE_LIST.find(nextToken) != Constants::ELSE_LIST.end()) { script.setPointer(nextData.getPointer() + 1); result = processBlock(script); } return Variable::emptyInstance; } |
Listing 6 |
void ParserFunction::addGlobalFunction (const string& name, ParserFunction* function) { auto tryInsert = s_functions.insert({name, function}); if (!tryInsert.second) { // The variable or function already exists. // Delete it and replace with the new one. delete tryInsert.first->second; tryInsert.first->second = function; } }
As we’ve mentioned before, we treat a variable as a function: as soon as the parser gets an expression like
"a = something"
, it will register a function with the keyword
"a"
and map it to the
Variable
"something"
(which will be calculated applying recursively the split-and-merge algorithm). In C++ the code for this is:
ParserFunction::addGlobalFunction("a", something /*Variable*/ );
Implementing the if – else if – else control flow statements
Let’s see how to implement the
if()
–
else if()
–
else()
functionality in CSCS.
The first and the third steps are clear: define the
if
constant and register a class implementing the
if
control flow statement with the parser, the same way we registered the
print()
function above.
The second step, the implementation of the
if
statement, is shown in Listing 6.
First we evaluate the condition inside of
if()
by recursively calling the split-and-merge algorithm on that condition. If the condition is true we process the whole
if()
block, recursively calling the split-and-merge algorithm on each statement inside of the
processBlock()
method. If the condition is false we first skip the whole
if()
block in the
skipBlock()
method. Then we evaluate the
else()
and
else if()
statements. The evaluation of
else if()
is basically same as the evaluation of the
if()
statement itself, so for
else if()
we recursively call the
if()
statement evaluation.
Note that we enhanced the execution of the
if
-statement here – as soon as there is a
break
or a
continue
statement, we get out of the
if ()
block – same way we get out from the
while()
block. This can be useful in case of nested
if
s.
Similarly, we can register any function with the parser, e.g.
while()
,
for()
,
try()
,
throw()
,
include()
, etc. We can also define local or global variables in the same way. In the next section we are going to see how to define functions in CSCS and add passed arguments as local variables to CSCS.
Implementing custom functions in CSCS
To write a custom function in the scripting language, let’s introduce two functions in C++,
FunctionCreator
and
CustomFunction
, both deriving from the
ParserFunction
base class. A
FunctionCreator
object is registered with the parser in the same way we registered
if()
and
print()
functions above.
As soon as the parser gets a token with the
"function"
keyword, an instance of the
FunctionCreator
will be executed, namely, its
evaluate()
method, see Listing 7.
Variable FunctionCreator::evaluate(ParsingScript& script) { string funcName = Utils::getToken(script, TOKEN_SEPARATION); vector<string> args = Utils::getFunctionSignature(script); string body = Utils::getBody(script, '{', '}'); CustomFunction* custFunc = new CustomFunction(funcName, body, args); ParserFunction::addGlobalFunction(funcName, custFunc); return Variable(funcName); } |
Listing 7 |
Basically, it just creates a new object, an instance of the
CustomFunction
, and initializes it with the extracted function body and the list of parameters. It also registers the name of the custom function with the parser, so the parser maps that name with the new
CustomFunction
object which will be called as soon as the parser encounters the function name keyword.
So all of the functions that we implement in the CSCS code correspond to different instances of the
CustomFunction
class. The custom function does primarily two things, see Listing 8. First, it extracts the function arguments and adds them as local variables to the parser (they will be removed from the parser as soon as the function execution is finished or an exception is thrown). It also checks that the number of actual parameters is equal to the number of the registered ones (this part is skipped for brevity).
Variable CustomFunction::evaluate(ParsingScript& script) { // 0. Extract function arguments. vector<Variable> args = Utils::getArgs(script); // 1. Add passed arguments as local variables. StackLevel stackLevel(m_name); for (size_t i = 0; i < m_args.size(); i++) { stackLevel.variables[m_args[i]] = args[i]; } ParserFunction::addLocalVariables(stackLevel); // 2. Execute the body of the function. Variable result; ParsingScript funcScript(m_body); while (funcScript.getPointer() < funcScript.size()-1 && !result.isReturn) { result = Parser::loadAndCalculate(funcScript); Utils::goToNextStatement(funcScript); } // 3. Return the last result of the execution. ParserFunction::popLocalVariables(); return result; } |
Listing 8 |
Second, the body of the function is evaluated, using the main parser entry point, the
loadAndCalculate()
method.
If the function body contains calls to other functions, or to itself, the calls to the
CustomFunction
can be recursive.
Let’s see this with an example in CSCS. It calculates recursively the Fibonacci numbers, see Listing 9.
cache["fib"] = 0; function fibonacci(n) { if (!isInteger(n)) { exc = "Fibonacci is for integers only" "(n="+ n +")"; throw (exc); } if (n < 0) { exc = "Negative number (n="+ n +") supplied"; throw (exc); } if (n <= 1) { return n; } if (contains(cache["fib"], n)) { result = cache["fib"][n]; print(" found in cache fib(", n, ")=", result); return result; } result = fibonacci(n - 2) + fibonacci(n - 1); cache["fib"][n] = result; return result; } |
Listing 9 |
The Fibonacci function above uses an auxiliary
isInteger()
function, also implemented in CSCS:
function isInteger(candidate) { return candidate == round(candidate); }
The
isInteger()
function calls yet another,
round()
function. The implementation of the
round()
function is already in the C++ code and is analogous to the implementation of the
print()
function that we saw in the previous section.
To execute the Fibonacci function with different arguments we can use the following CSCS code:
n = ...; print("Calculating Fibonacci(", n, ")..."); try { f = fibonacci(n); print("fibonacci(", n, ")=", f); } catch(exc) { print("Caught: " + exc); }
We get the output in Figure 1 for different values of
n
.
Calculating Fibonacci(10)... found in cache fib(2)=1 found in cache fib(3)=2 found in cache fib(4)=3 found in cache fib(5)=5 found in cache fib(6)=8 found in cache fib(7)=13 found in cache fib(8)=21 Fibonacci(10)=55 Calculating Fibonacci(-10)... Caught: Negative number (n=-10) supplied at fibonacci() Calculating Fibonacci(1.500000)... Caught: Fibonacci is for integers only (n=1.500000) at fibonacci() |
Figure 1 |
Since the exceptions happened at the global level, the exception stacks printed consisted only of the
fibonacci()
function itself. To keep track of the execution stack, i.e. CSCS functions being called, internally we use a C++ stack data structure, where we add every executing
CustomFunction
object as soon as we start function execution and remove it as soon as the execution is over. In case of an exception we just print out the contents of this stack. Then we clear the execution stack up to the level where the exception was caught. The implementation of the execution stack and of the
try()
and
throw()
functions can be consulted in [
Downloads
].
The implementation of the Fibonacci function in Listing 9 uses caching of already calculated Fibonacci numbers by using dictionaries – we will see how one can implement dictionaries next.
Arrays and other data structures
To declare an array and initialize it with some data we use the same CSCS language statement. The array elements for the initialization are declared between the curly braces. Here is an example in CSCS:
a = 10; arr = {++a-a--, ++a*exp(0)/a--, -2*(--a - ++a)}; i = 0; while(i < size(arr)) { print("a[", i, "]=", arr[i], ", expecting ", i); i++; }
The number of elements in the array is not explicitly declared since it can be deduced from the assignment.
The function
size()
is implemented in C++. It returns the number of elements in an array. In case the passed argument is not an array, it will return the number of characters in it.
Internally an array is implemented as a vector, so you can add elements to it on the fly. In CSCS we access elements of an array, or modify them, by using the squared brackets. As soon as the parser gets a token containing an opening squared bracket, it knows that it is for an array index, so it applies recursively the whole split-and-merge algorithm to the string between the squared brackets to extract the index value. There can be unlimited number of dimensions of an array (well, limited by the system resources) because the array is implemented as a
vector<Variable>
: the
Variable
class has a member called
"tuple"
and declared as
vector<Variable>
. For instance, accessing
a[i][j][k]
in the CSCS code means
a.tuple[i].tuple[j].tuple[k]
in the C++ underlying code (
"a"
is a
Variable
, see Listing 1 for
Variable
definition).
In other words, for each consequent index in squared brackets the parser will create a new
Variable
of type array or use an existing
"tuple"
member.
If we access an element of an array and that element has not been initialized yet, an exception will be thrown by the parser. However, it’s possible to assign a value to just one element of an array, even if the index used is greater than the number of elements in the array and even if the array has not been initialized yet. In this case the non-existing elements of the array will be initialized with the empty values. The CSCS code below is legal, even if the array has not been initialized before:
i = 10; while(--i > 0) { array[i] = 2*i; } print("array[9]=", array[9]);// prints 18 print("size(array)=", size(array));// prints 10
We can also add other data structures to the CSCS language. Let’s see an example of adding a dictionary, implemented internally as an
unordered_map
. We add the following member to the
Variable
class:
unordered_map<string, size_t> dictionary;
This is the mapping between a dictionary key and an index of already existing member
vector<Variable>
tuple, where the actual dictionary value will be stored. So every time a new key-value pair is added to the dictionary the following code is executed:
tuple.emplace_back(var); dictionary[key] = tuple.size() - 1;
Every time an existing key is accessed the following code is executed (a check for existence is skipped):
auto it = dictionary.find(key); size_t ptr = it->second; return tuple[ptr];
With a few changes one can use not only strings, but anything else as a dictionary key. Similarly, we can add other data structures to CSCS – as long as a data structure exists, or can be implemented in C++, it can be added to CSCS as well.
In Listing 6 we saw the implementation of the
if()
-
else if()
-
else
control flow statements. Towards the end of the listing you might have scratched your head, asking why we didn’t compare the extracted token with the
"else"
string, but we did a comparison with the
ELSE_LIST
? The reason is that the
ELSE_LIST
contains all possible synonyms and translations of the
"else"
keyword to any of the languages that the user might have supplied in the configuration file. How is a keyword translation added to the parser?
How to add keyword synonyms and language translations
One of the advantages of writing a custom programming language is a possibility to have the keywords in any language (besides the ‘base’ language, understandably chosen to be English). Also we can create our language in such a way, that there will be no need to remember that one must use
dir
,
copy
,
move
,
findstr
on Windows, but
ls
,
cp
,
mv
, and
grep
on Unix; and that a very nice shortcut
cd..
works only on Windows: in our language we can have both! And with just a few configuration changes.
Here is how we can add keyword synonyms and translations to the CSCS language.
First, we define them in a configuration file; Figure 2 is an incomplete example of a configuration file with Spanish translations. The same configuration file may contain an arbitrary number of languages. For example, we can also include the keyword synonyms in the same file (see Figure 3).
function = función include = incluir if = si else = sino elif = sino_si return = regresar print = imprimir size = tamaño while = mientras |
Figure 2 |
ls = dir cp = copy mv = move rm = rmdir grep = findstr |
Figure 3 |
After reading the keyword translations we add them to the parser one by one (see Listing 10).
void addTranslation(const string& origName, const string& translation) { ParserFunction* origFunction = ParserFunction::getFunction(origName); if (origFunction != 0) { ParserFunction::addGlobalFunction (translation, origFunction); } tryAddToSet(originalName, translation, CATCH, CATCH_LIST); tryAddToSet(originalName, translation, ELSE, ELSE_LIST); //… other sets } |
Listing 10 |
First, we try to add a translation to one of the registered functions (like
print()
,
sin()
,
cos()
,
round()
,
if()
,
while()
,
try()
,
throw()
, etc.). Basically, we map the new keyword to the
ParserFunction
, corresponding to the original keyword. Therefore, as soon as the parser extracts either the original keyword (say
"cp"
), or the one added from the configuration file (e.g.
"copy"
), the same C++ function
CopyFunction
, deriving from the
ParserFunction
class, will be called.
Then we try to add new keywords to the sets of additional keywords, that are not functions (e.g. a
"catch"
is processed only together with the try-block,
"else"
and
"else if"
are processed together with the
if
-block, etc.) The
tryAddToSet()
is an auxiliary template function that adds a translation to a set, in case the original keyword name belongs to that set (e.g.
CATCH = "catch"
belongs to the
CATCH_LIST
).
Listing 11 is an example of the CSCS code using Spanish keywords and functions.
palabras = {"Este", "sentido", "es", "en", "Español"}; tam = tamaño(palabras); i = 0; mientras(i < tam) { si (i % 2 == 0) { imprimir(palabras[i]); } i++; } |
Listing 11 |
Conclusions
Using the techniques presented in this article and consulting the source code in [ Downloads ] you can develop your own fully customized language using your own keywords and functions. The resulting language will be interpreted at runtime directly, statement by statement.
We saw that implementing a printing function and a control flow statement is basically the same: one needs to write a new class, deriving from the
ParserFunction
class and override its
evaluate()
method. Then one needs to register that function with the parser, mapping it to a keyword. The
evaluate()
method will be called by the parser as soon as the parser extracts the keyword corresponding to this function. For the lack of space we didn’t show how to implement the
while()
,
try
,
throw
,
break
,
continue
, and
return
control flow statements but they are all implemented analogously. The same applies to the prefix and postfix
++
and
--
operators that we did not have space to show but you can consult in [
Downloads
].
Using the above approach of adding a new function to the parser, anything can be added to the CSCS language as long as it can be implemented in C++.
You can also use this custom language as a shell language (like bash on Unix or PowerShell on Windows) to perform different file or operating system commands (find files, list directories or running processes, kill or start a new process from the command line, and so on). Stay tuned to see how to do that in our next article.
References
[ANTLR] ANTLR Framework, http://www.antlr.org
[Downloads] Implementation of the CSCS language in C++, http://www.ilanguage.ch/p/downloads.html
[Kaplan15a] V. Kaplan, ‘Split and Merge Algorithm for Parsing Mathematical Expressions’, ACCU CVu , 27-2, May 2015, http://accu.org/var/uploads/journals/CVu272.pdf
[Kaplan15b] V. Kaplan, ‘Split and Merge Revisited’, ACCU CVu , 27-3, July 2015, http://accu.org/var/uploads/journals/CVu273.pdf
[Kaplan15c] V. Kaplan, ‘A Split-and-Merge Expression Parser in C#’, MSDN Magazine , October 2015, https://msdn.microsoft.com/en-us/magazine/mt573716.aspx
[Kaplan16] V. Kaplan, ‘Customizable Scripting in C#’, MSDN Magazine , February 2016, https://msdn.microsoft.com/en-us/magazine/mt632273.aspx
[Spirit] Boost Spirit Library, http://boost-spirit.com
Acknowledgements
I’d like to thank Frances Buontempo and the Overload review team for providing their feedback which enabled me to elevate the content presented in this article.