OADL Programming Reference Manual

Lexical Elements

Character Set

OADL uses the UTF-8 (RFC 3629) input character encoding, which allows for minimum size for most uses (and is compatible with ASCII), but also allows for the use of the full 21-bit range of Unicode standard character encodings for identifiers, string literals and comments. In addition, OADL uses UTF-8 for text-based output. Although OADL accepts extended Unicode characters as elements of identifiers, only the ASCII digits '0' through '9' are allowed in numeric constants.

Note that illegal UTF-8 seqences in a source file will produce a compile-time error.

OADL is a token-based compiler. It uses a greedy algorithm to scan tokens; this means, for example, that the sequence of characters === is interpreted as the token == followed by the token = (which the compiler would then flag as a syntax error).

Whitespace (ASCII space, carriage return, line feed, form feed, horizontal tab, and vertical tab) is ignored, except as it separates tokens. Comments (delimited by /* and */) are considered whitespace, and otherwise ignored. End-line comments (delimited by //) are also ignored. For compatibility with some UTF-8 editors (i.e. Windows Notepad), the Unicode byte-order-mark 0xFEFF (or UTF-8 0xEF, 0xBB, 0xBF) is allowed at the beginning of an OADL source file and is treated as whitespace.

The following ASCII characters are recognized as whitespace:

ASCII code	Character name
9	horizontal tab
10	line feed
11	vertical tab
12	form feed
13	carriage return
32	space

Tokens

Punctuation

An OADL compiler recognizes the following punctuation as single-character tokens:

`&`	`\|`	`^`	`~`	`+`	`-`	`*`	`/`	`%`
`.`	`,`	`:`	`;`	`!`	`(`	`)`	`{`	`}`
`[`	`]`	`<`	`>`	`=`	`@`	`?`	`

In addition, OADL recognizes the following two-character tokens:

`##`	`!=`	`==`	`<=`	`>=`	`<<`
`>>`	`+=`	`-=`	`*=`	`/=`	`%=`
`&=`	`\|=`	`^=`	`++`	`--`	`=>`
`?=`	`#=`	`&&`	`\|\|`	`~=`	`?#`
`::`	`**`	`\\|`	`\^`	`\&`	`\<`
`\>`	`\+`	`\-`	`\*`	`\/`	`\%`
`!-`	`@@`	`#[`	`#(`	`?*`	`??`
`->`	`:=`	`#{`

OADL also recognizes the following three-character tokens:

`<<=`	`>>=`	`<<<`	`>>>`	`...`
`\==`	`\#=`	`\!=`	`\<=`	`\>=`
`\<<`	`\>>`	`\=>`	`\~=`	`\**`

Character and String Constants

There are several composite token types. The first is a character constant token, expressed as a single character between two single quote marks, thus: 'x' A string constant token is lexically similar, expressed as zero or more characters between two double quote marks, thus: "abcdefg"

In both character constant tokens and string constant tokens, the character \ has special significance. Just as in C and C++, it is an "escape" character, which alters the interpretation of a number of characters following. Here is the complete list of escapes recognized by OADL:

Escape	Meaning
`\0`	ASCII NUL character
`\a`	ASCII bell character
`\b`	ASCII backspace character
`\f`	ASCII formfeed character
`\n`	ASCII linefeed character
`\r`	ASCII carriage return character
`\t`	ASCII horizontal tab character
`\v`	ASCII vertical tab character
`\x`HEX	Hexadecimal character code (up to 8 digits)
`\'`	Non-terminating single quote
`\"`	Non-terminating double quote
`\\`	The backslash character
`\`any	The given character

If a character constant or string constant token is immediately preceded by the character L the token is a wide character constant or wide string constant token. If any of the characters inside the token have an encoding greater than 127 (either due to specification via \xHEX or via UTF-8 sequences), then the constant is also considered to be a wide character constant or wide string constant token.

Integer Constants

In OADL, an integer constant token takes one of three forms:

A decimal constant expressed as one or more digits in the range 0 through 9
A hexadecimal constant of the form 0xHEX, where HEX is a sequence of one or more hexadecimal digits (the numbers 0 through 9 and the letters a through f and A through F)
A binary constant of the form 0bBIN, where BIN is a sequence of one or more binary digits (the numbers 0 and 1)

The digits of an integer constant may be separated by an underscore _ for enhanced readability (the underscore must be between digits, not at the beginning or the end of the number). An integer constant may also have the one of the following suffixes to give it a type other than Int (the suffixes are case-independent):

Suffix	Resulting type
`B`	`Byte` (base-10 integer constants)
`SB`	`Byte` (hexadecimal integer constants)
`UB`	`Ubyte`
`S`	`Short`
`US`	`Ushort`
`U`	`Uint`
`L`	`Long`
`UL`	`Ulong`

Here are some examples of integer constant tokens:

Token	Description
`0x1000`	The number 4096, in hex
`123`	The number 123, in decimal
`0x1FFF_FFFF`	The largest positive integer, in hex
`0b111_1111b`	The largest `Byte` value, in binary

Floating Point Constants

A floating point constant token in OADL is distinguished from an integer constant token by one of two things: either it has an exponent part (the character e or E followed by a signed integer exponent), or it has a fractional part (the character . followed by the digits comprising the fraction), or it has both. A floating point constant may also have one of the following suffixes to give it a type other than Float (the suffixes are case-independent):

Suffix	Resulting type
`H`	`Half`
`D`	`Double`

Hexadecimal floating point constants are supported; they are similar to hexadecimal integer constants but are distinguished from them by having an exponent part (the character p or P followed by a signed integer exponent signifying a power of two), or having a fractional part (the character . followed by the hexadecimal digits comprising the fraction), or both. A hexadecimal floating point constant may have one of the following suffixes to give it a type other than Float (the suffixes are case-independent):

Suffix	Resulting type
`H`	`Half`
`L`	`Double`

Like integer constants, numeric digits of a floating point constant can be separated by the underscore _ (again, the underscore must be between two numeric digits, not at the beginning or the end of the string of digits).

Here are some examples of floating point constant tokens:

Token	Description
`3.14159_26535_89793_24d`	A `Double` approximation of π
`1.e38`	About the biggest `Float` representable
`1e-38`	About the smallest `Float` representable
`.0h`	Zero as a `Half`
`0x1p-1L`	The `Double` 0.5

Identifiers

OADL identifier tokens consist of a Unicode character with an alphabetic attribute or the character _ or $, followed by zero or more of:

Unicode characters with an alphabetic attribute
Unicode characters with a numeric attribute
The characters _ or $

Unicode characters with any of the following attributes are recognized as alphabetic when part of an identifier:

Attribute	Abbreviation
Letter, upper case	`Lu`
Letter, lower case	`Ll`
Letter, title case	`Lt`
Letter, other	`Lo`

Unicode characters with any of the following attributes are recognized as numeric when part of an identifier:

Attribute	Abbreviation
Number, decimal digit	`Nd`
Number, letter	`Nl`
Number, other	`No`

Here are some example identifier tokens: Main $foo my_house x1 bißchen Note that OADL is case-sensitive; that is, the identifier tokens Main, main, mAIn, and MAIN are all unique.

OADL reserves several identifiers as keywords for syntactic purposes. This is the complete list:

`assert`	`break`	`case`	`catch`
`class`	`const`	`continue`	`default`
`do`	`else`	`extern`	`for`
`forall`	`foreach`	`if`	`match`
`namespace`	`new`	`operator`	`proc`
`protected`	`public`	`return`	`static`
`switch`	`throw`	`try`	`using`
`var`	`while`	`with`	`__FILE__`
`__LINE__`

Keywords may not be used as user identifiers (variable names, procedure names, etc.) in OADL.

It is non-trivial to distinguish between the various integer constant, floating point constant, and identifier tokens. See the Token State Machine in the OADL Implementation Notes chapter for more information.

Note that the __FILE__ and __LINE__ keywords are special. The __FILE__ keyword is replaced at compile time with a string constant token containing the name of the file currently being compiled. The __LINE__ keyword is replaced at compile time with an integer constant token containing the line number where it is found.

Local match arguments

In the context of a match statement, a match argument consisting of the character ? followed by one or more decimal digits may be used. It indicates the n^th match result of the pattern:

    match ("123 45") {
    case "([0-9]+) ([0-9]+)" :
        "First: ", ?1, "; second: ", ?2, '\n';
    }
First: 123; second: 45

The number of match arguments present can be found by using the token ?# (note that the zero'th argument is always the entire string matched):

    match ("123 456 789") {
    case "([0-9]*) ([0-9]*) ([0-9]*)" :
        "Found ", ?#, " matches:\n";
        for (var i = 0; i < ?#; i++) {
            "", oadl::matchvec()[i], '\n';
        }
    }
Found 4 matches:
123 456 789
123
456
789

OADL Preprocessing

It is often convenient to break up a program into several source files. To enable this, the include statement is supported:

#include "filename"

Tokens are read from the given filename until it has been exhausted; at that point, lexical analysis returns to the current file. It is assumed that the filename will have a suffix of ".oah"; if so, it is not necessary to include the suffix in the #include statement. It is not required that files included have a ".oah" suffix, though.

It is implementation-dependent how many nested levels of includes are supported; however, all implementations will support at least 4 levels.

The OADL preprocessor supports macros. Macros are similar to those in C/C++; however, unlike C/C++, parenthesis and braces in a macro must match. Additionally, instead of escaping the end-of-line as in C/C++, multi-token OADL macro definitions are completely enclosed in matching braces. For example:

    #define foo(x) {
        x = x + 1
    }

    var a = 1
    foo(a)
    "a = ", a
a = 2

The matching braces are not included in the expansion of the macro. Unlike C/C++ there are no token pasting or tokenizing capabilities in OADL macros.

Macros may define and undefine other macros; for example:

    #define bar(x) {
        #ifdef(foo)
            #undef foo
        #endif
        #define foo(y) {x + y}
    }
    bar(3)
    foo(4)
7

    bar(5)
    foo(4)
9

Unlike C/C++, macros may not be redefined. Instead they must be undefined using the #undef statement:

    #define foo(a) {a+1}
    foo(1)
2
    #undef foo
    #define foo(a) {a+2}
    foo(1)
3

OADL also supports conditional compilation. The tokens #if, #ifdef, #else, #elif, and #endif are used for conditional compilation. As in C/C++, OADL conditional compilation statements may be nested. Unlike C/C++, OADL conditional compilation statements do not need to be on individual lines. This means that the condition must be enclosed in parentheses ( ):

    #ifdef(foo) "This statement is not printed"
    #elif(0) "Neither is this statement"
    #else "This statement is printed" #endif
This statement is printed

OADL also supports the #defined query which may be used in conditional compilation expressions as well as in regular OADL expressions:

    #define foo(a) {a+1}
    "Is foo defined? ", #defined(foo)
Is foo defined? true

    #if(0 || #defined(foo)) "Foo is still defined" #endif
Foo is still defined

Just as in C/C++, other than the conditional compilation tokens #if, #ifdef, #else, #elif, and #endif, tokens inside a non-compiled section of a conditional compilation statement are ignored, including #include, #define, and #undef tokens. Note that the input character stream is still processed as tokens; this can lead to subtle errors with unterminated comments:

    // Nothing is printed due to the unterminated comment inside
    // the #ifdef:
    #ifdef(foo) "This statement is not printed" /*
    #else "Neither is this one." */
    #endif

OADL reserves the following preprocessor keywords for future use:

#arg #args #nargs

OADL Unnamed Value Tokens

For unnamed non-numeric, non-array values, OADL accepts and prints the following special token sequences:

`#OBJ(num)`	Unnamed object number num
`#PRC(num)`	Unnamed proc number num
`#PTR(num)`	System-specific pointer number num

Note that num must be a compile-time decimal integer constant.

For example,

    a = proc() {"Hello a!\n";}
    a
#PRC(7)

    b = #PRC(7)
    b()
Hello a!

    i = 7
    c = #PRC(i)
Decimal integer constant expected

OADL Desk Calculator Tokens

The OADL desk calculator accepts the following tokens for special use:

`#classes`	`#consts`	`#defines`	`#edit`
`#erase`	`#externs`	`#help`	`#intrinsics`
`#list`	`#load`	`#namespaces`	`#object`
`#procs`	`#publics`	`#quit`	`#reset`
`#save`	`#vars`

Continue to Syntax

Return to Introduction

`&`	`\|`	`^`	`~`	`+`	`-`	`*`	`/`	`%`
`.`	`,`	`:`	`;`	`!`	`(`	`)`	`{`	`}`
`[`	`]`	`<`	`>`	`=`	`@`	`?`	`

`&`	`\|`	`^`	`~`	`+`	`-`	`*`	`/`	`%`
`.`	`,`	`:`	`;`	`!`	`(`	`)`	`{`	`}`
`[`	`]`	`<`	`>`	`=`	`@`	`?`	`

`&`	`\|`	`^`	`~`	`+`	`-`	`*`	`/`	`%`
`.`	`,`	`:`	`;`	`!`	`(`	`)`	`{`	`}`
`[`	`]`	`<`	`>`	`=`	`@`	`?`	`