OADL uses the UTF-8 (RFC 3629) input character encoding, which allows for minimum size for most uses (and is compatible with ASCII), but also allows for the use of the full 21-bit range of Unicode standard character encodings for identifiers, string literals and comments. In addition, OADL uses UTF-8 for text-based output. Although OADL accepts extended Unicode characters as elements of identifiers, only the ASCII digits '0' through '9' are allowed in numeric constants.
Note that illegal UTF-8 seqences in a source file will produce a compile-time error.
OADL is a token-based compiler. It uses a greedy algorithm to scan
tokens; this means, for example, that the sequence of characters
===
is interpreted as the token
==
followed by the token =
(which the compiler would then flag as a syntax error).
Whitespace (ASCII space, carriage return, line feed, form feed,
horizontal tab, and vertical tab) is ignored, except as it separates
tokens. Comments (delimited by /*
and
*/
) are considered whitespace, and otherwise
ignored. End-line comments (delimited by //
) are
also ignored. For compatibility with some UTF-8 editors (i.e. Windows
Notepad), the Unicode byte-order-mark 0xFEFF (or UTF-8 0xEF, 0xBB,
0xBF) is allowed at the beginning of an OADL source file and is
treated as whitespace.
The following ASCII characters are recognized as whitespace:
ASCII code | Character name |
---|---|
9 | horizontal tab |
10 | line feed |
11 | vertical tab |
12 | form feed |
13 | carriage return |
32 | space |
An OADL compiler recognizes the following punctuation as single-character tokens:
& |
| |
^ |
~ |
+ |
- |
* |
/ |
% |
. |
, |
: |
; |
! |
( |
) |
{ |
} |
[ |
] |
< |
> |
= |
@ |
? |
` |
|
In addition, OADL recognizes the following two-character tokens:
## |
!= |
== |
<= |
>= |
<< |
>> |
+= |
-= |
*= |
/= |
%= |
&= |
|= |
^= |
++ |
-- |
=> |
?= |
#= |
&& |
|| |
~= |
?# |
:: |
** |
\| |
\^ |
\& |
\< |
\> |
\+ |
\- |
\* |
\/ |
\% |
!- |
@@ |
#[ |
#( |
?* |
?? |
-> |
:= |
#{ |
OADL also recognizes the following three-character tokens:
<<= |
>>= |
<<< |
>>> |
... |
\== |
\#= |
\!= |
\<= |
\>= |
\<< |
\>> |
\=> |
\~= |
\** |
There are several composite token types. The first is a
character constant token, expressed as a single character
between two single quote marks, thus: 'x'
A
string constant token is lexically similar, expressed as zero
or more characters between two double quote marks, thus:
"abcdefg"
In both character constant tokens and string
constant tokens, the character \
has special
significance. Just as in C and C++, it is an "escape"
character, which alters the interpretation of a number of characters
following. Here is the complete list of escapes recognized by OADL:
Escape | Meaning |
---|---|
\0 |
ASCII NUL character |
\a |
ASCII bell character |
\b |
ASCII backspace character |
\f |
ASCII formfeed character |
\n |
ASCII linefeed character |
\r |
ASCII carriage return character |
\t |
ASCII horizontal tab character |
\v |
ASCII vertical tab character |
\x HEX |
Hexadecimal character code (up to 8 digits) |
\' |
Non-terminating single quote |
\" |
Non-terminating double quote |
\\ |
The backslash character |
\ any |
The given character |
If a character constant or string constant token
is immediately preceded by the character L
the
token is a wide character constant or wide string
constant token. If any of the characters inside the token have an
encoding greater than 127 (either due to specification via
\x
HEX or via UTF-8 sequences), then the
constant is also considered to be a wide character constant or
wide string constant token.
In OADL, an integer constant token takes one of three forms:
0
through 9
0x
HEX, where HEX is a sequence of
one or more hexadecimal digits (the numbers 0
through 9
and the letters a
through f
and A
through
F
)0b
BIN, where BIN is a sequence of
one or more binary digits (the numbers 0
and
1
)The digits of an integer constant may be separated by an
underscore _
for enhanced readability (the
underscore must be between digits, not at the beginning or the
end of the number). An integer constant may also have the one
of the following suffixes to give it a type other than Int
(the suffixes are case-independent):
Suffix | Resulting type |
---|---|
B |
Byte (base-10 integer constants) |
SB |
Byte (hexadecimal integer constants) |
UB |
Ubyte |
S |
Short |
US |
Ushort |
U |
Uint |
L |
Long |
UL |
Ulong |
Here are some examples of integer constant tokens:
Token | Description |
---|---|
0x1000 |
The number 4096, in hex |
123 |
The number 123, in decimal |
0x1FFF_FFFF |
The largest positive integer, in hex |
0b111_1111b |
The largest Byte value, in binary |
A floating point constant token in OADL is distinguished
from an integer constant token by one of two things: either it
has an exponent part (the character e
or
E
followed by a signed integer exponent), or it
has a fractional part (the character .
followed
by the digits comprising the fraction), or it has both. A floating
point constant may also have one of the following suffixes to
give it a type other than Float (the suffixes are
case-independent):
Suffix | Resulting type |
---|---|
H |
Half |
D |
Double |
Hexadecimal floating point constants are supported; they
are similar to hexadecimal integer constants but are
distinguished from them by having an exponent part (the character
p
or P
followed by a signed
integer exponent signifying a power of two), or having a fractional
part (the character .
followed by the hexadecimal
digits comprising the fraction), or both. A hexadecimal floating
point constant may have one of the following suffixes to give it
a type other than Float (the suffixes are case-independent):
Suffix | Resulting type |
---|---|
H |
Half |
L |
Double |
Like integer constants, numeric digits of a floating
point constant can be separated by the underscore
_
(again, the underscore must be between
two numeric digits, not at the beginning or the end of the string of
digits).
Here are some examples of floating point constant tokens:
Token | Description |
---|---|
3.14159_26535_89793_24d |
A Double approximation of π |
1.e38 |
About the biggest Float representable |
1e-38 |
About the smallest Float representable |
.0h |
Zero as a Half |
0x1p-1L |
The Double 0.5 |
OADL identifier tokens consist of a Unicode character with
an alphabetic attribute or the character _
or $
, followed by zero or more of:
_
or $
Unicode characters with any of the following attributes are recognized as alphabetic when part of an identifier:
Attribute | Abbreviation |
---|---|
Letter, upper case | Lu |
Letter, lower case | Ll |
Letter, title case | Lt |
Letter, other | Lo |
Unicode characters with any of the following attributes are recognized as numeric when part of an identifier:
Attribute | Abbreviation |
---|---|
Number, decimal digit | Nd |
Number, letter | Nl |
Number, other | No |
Here are some example identifier tokens: Main
$foo my_house x1 bißchen
Note that OADL is
case-sensitive; that is, the identifier tokens
Main
, main
,
mAIn
, and MAIN
are all
unique.
OADL reserves several identifiers as keywords for syntactic purposes. This is the complete list:
assert |
break |
case |
catch |
class |
const |
continue |
default |
do |
else |
extern |
for |
forall |
foreach |
if |
match |
namespace |
new |
operator |
proc |
protected |
public |
return |
static |
switch |
throw |
try |
using |
var |
while |
with |
__FILE__ |
__LINE__ |
Keywords may not be used as user identifiers (variable names, procedure names, etc.) in OADL.
It is non-trivial to distinguish between the various integer constant, floating point constant, and identifier tokens. See the Token State Machine in the OADL Implementation Notes chapter for more information.
Note that the __FILE__
and
__LINE__
keywords are special. The
__FILE__
keyword is replaced at compile time with
a string constant token containing the name of the file
currently being compiled. The __LINE__
keyword is
replaced at compile time with an integer constant token
containing the line number where it is found.
In the context of a match
statement, a match
argument consisting of the character ?
followed
by one or more decimal digits may be used. It indicates the
nth match result of the pattern:
match ("123 45") { case "([0-9]+) ([0-9]+)" : "First: ", ?1, "; second: ", ?2, '\n'; } First: 123; second: 45
The number of match arguments present can be found by using the
token ?#
(note that the zero'th argument is
always the entire string matched):
match ("123 456 789") { case "([0-9]*) ([0-9]*) ([0-9]*)" : "Found ", ?#, " matches:\n"; for (var i = 0; i < ?#; i++) { "", oadl::matchvec()[i], '\n'; } } Found 4 matches: 123 456 789 123 456 789
It is often convenient to break up a program into several source files. To enable this, the include statement is supported:
#include "filename"
Tokens are read from the given filename until it has been exhausted; at that point, lexical analysis returns to the current file. It is assumed that the filename will have a suffix of ".oah"; if so, it is not necessary to include the suffix in the #include statement. It is not required that files included have a ".oah" suffix, though.
It is implementation-dependent how many nested levels of includes are supported; however, all implementations will support at least 4 levels.
The OADL preprocessor supports macros. Macros are similar to those in C/C++; however, unlike C/C++, parenthesis and braces in a macro must match. Additionally, instead of escaping the end-of-line as in C/C++, multi-token OADL macro definitions are completely enclosed in matching braces. For example:
#define foo(x) { x = x + 1 } var a = 1 foo(a) "a = ", a a = 2
The matching braces are not included in the expansion of the macro. Unlike C/C++ there are no token pasting or tokenizing capabilities in OADL macros.
Macros may define and undefine other macros; for example:
#define bar(x) { #ifdef(foo) #undef foo #endif #define foo(y) {x + y} } bar(3) foo(4) 7 bar(5) foo(4) 9
Unlike C/C++, macros may not be redefined. Instead they must be
undefined using the #undef
statement:
#define foo(a) {a+1} foo(1) 2 #undef foo #define foo(a) {a+2} foo(1) 3
OADL also supports conditional compilation. The tokens
#if
, #ifdef
,
#else
, #elif
, and
#endif
are used for conditional compilation. As
in C/C++, OADL conditional compilation statements may be nested.
Unlike C/C++, OADL conditional compilation statements do not need to
be on individual lines. This means that the condition must be
enclosed in parentheses ( )
:
#ifdef(foo) "This statement is not printed" #elif(0) "Neither is this statement" #else "This statement is printed" #endif This statement is printed
OADL also supports the #defined
query which
may be used in conditional compilation expressions as well as in
regular OADL expressions:
#define foo(a) {a+1} "Is foo defined? ", #defined(foo) Is foo defined? true #if(0 || #defined(foo)) "Foo is still defined" #endif Foo is still defined
Just as in C/C++, other than the conditional compilation tokens
#if
, #ifdef
,
#else
, #elif
, and
#endif
, tokens inside a non-compiled section of a
conditional compilation statement are ignored, including
#include
, #define
, and
#undef
tokens. Note that the input character
stream is still processed as tokens; this can lead to subtle errors
with unterminated comments:
// Nothing is printed due to the unterminated comment inside // the #ifdef: #ifdef(foo) "This statement is not printed" /* #else "Neither is this one." */ #endif
OADL reserves the following preprocessor keywords for future use:
#arg |
#args |
#nargs |
For unnamed non-numeric, non-array values, OADL accepts and prints the following special token sequences:
#OBJ(num) |
Unnamed object number num |
#PRC(num) |
Unnamed proc number num |
#PTR(num) |
System-specific pointer number num |
Note that num must be a compile-time decimal integer constant.
For example,
a = proc() {"Hello a!\n";} a #PRC(7) b = #PRC(7) b() Hello a! i = 7 c = #PRC(i) Decimal integer constant expected
The OADL desk calculator accepts the following tokens for special use:
#classes |
#consts |
#defines |
#edit |
#erase |
#externs |
#help |
#intrinsics |
#list |
#load |
#namespaces |
#object |
#procs |
#publics |
#quit |
#reset |
#save |
#vars |
|
|