Yet another scripting language - parts 5, 6, and 7

home

table of contents

Computer Science

September 2003

email

Yet another scripting language – parts 5, 6, and 7

This article is a continuation of the discussion of “yet another scripting language”, a scripting language that I am proposing to create. The ongoing presentation is broken up into sections, each section discussing some aspect of the proposed language.
The articles are not definitive specifications; rather, they are working documents exploring possibilities.
There are five prior threads in comp.lang.misc that are devoted to this projected language. The thread titles are:
Yet Another Scripting Language – Syntax thoughts
Yet another scripting language – arithmetic
Yet another scripting language – formalizing flow control
Yet another scripting language – Variables, function invocation, and some syntax
Yet Another Scripting Language – Reprise

There are three web pages that summarize my thoughts on said topics at the end of said threads. These pages are:
https://richardhartersworld.com/cri/2001/newlang01.html https://richardhartersworld.com/cri/2001/newlang02.html https://richardhartersworld.com/cri/2003/newlang03.html
This article has three sections, sections 5, 6, and 7. Section 5 is a potpourri of minor updates. Section 6 is a major rewrite of the material about procedures, functions, and other executable routines. Section 7 has initial material about file structure and some illustrative sample code.
The current interim name for this language is San. Suggestions for a better name are solicited.
5.0 Miscellaneous updates

5.0.1 Indentation

All blocks must begin with the keyword “begin”. The content within the block must be indented. The block must be terminated with an “end” statement that, with the single exception of comment blocks, must also be indented.
Preferably (but not obligatorily) all statements within the block that are not part of an internal block should have the same indentation. However no statement shall be indented deeper than any begin statement with the block.
The indentation characters within a block may be either spaces or tabs but not a mixture.
The “end” statement need not have an argument; if it does the one and only argument may be the name of block type, the name of an entity being defined, or a label supplied in the corresponding “begin” statement. For example the statement
begin loop ; label = foo
may be terminated by any of these
end end loop end foo
Similarly the definition block
begin function foo
may be terminated by any of these
end end function end foo

5.0.2 Comments, comment function, and comment blocks
Putting the ‘#’ character as the first non-white-space character in a line makes the entire line a comment. The ‘#’ character is also the name of the comment function, which may appear anywhere in the program text. A comment block starts with ‘begin’ with either ‘#’ or ‘comment’ as an argument, contains an indented block of comments, and ends with the corresponding end statement. For example:
begin # This is a comment. It is only a comment. If it were code it would Still be a comment. end #
Note: The terminator for the comment block is NOT indented. This is the only exception to the indented end rule.
5.0.3 Character string representations
San supports 4 (!)different ways to represent character strings. They are:
(1) Single characters using ‘: A token consisting of two characters with the first character being a single quote is a literal representing the second character. Thus ‘x is a literal representing a string consisting of the character x.
(2) Strings using “: Tokens with a leading double quotation mark are literal strings (no trailing ” is needed.) Thus “foo is the string “foo”. If the string is not enclosed in braces {} or angle brackets <> the string is terminated by white space.
If it is so enclosed white space is part of the text. Thus {“foo bar} is the string “foo bar”.
(3) Strings using count format: Tokens beginning with a integer followed by the letter H (case insensitive) are strings consisting of specified number of characters after the ‘H’. Thus 3Hfoo is the string “foo”. All special characters including white space within the counted number of characters are treated literally. Thus the string 4H{” < is the four characters, {, “, space, and <.
(4) Text blocks: Text blocks are useful for text running over more than one line. The format is:
begin text NAME BODY end
where NAME is the name of the variable containing the text, and BODY is one or more lines of text. Each line has a leading character (after the indentation) that may either be | or :. Lines beginning with : are taken literally as is – no text substitution is performed. Text substitution is done in lines that begin with |. For example:
begin text blatifu |Dear {addressee} | | Please find enclosed a check for {amount} lire. | .Ralph 124c41+ {:-} end
Text substitution is done on <addressee} and {amount} but not on {:-}.
5.0.4 Infix operators
Functions with two arguments can be converted into an infix operation by appending a question mark to the function name. For example
<lt x y> <# Yields true if x is less than y>
is equivalent to
(x lt? y) <# Also yields true if x is less than y>
In general (i.e., I haven’t spelled out rules yet) missing arguments are filled out from the enclosing scope. The two principal cases, perhaps the only cases, are in switch blocks and in sequent operators. In switch blocks incomplete delimited relationship expressions are filled in with switch selectors. Example:
begin switch x lt? y => action1 <# when x lt? y perform action1> z gt? => action2 <# when z gt? x perform action2> end switch
When relationships are used as sequent operators one element will be absent; the operator is a filter that passes elements satisfying the relationship. Example:
x[] <- [1...n | gt? m]

5.0.5 Sequent operators
Sequents now look like this:
[source | op ...]
For example,
[1...100 | step 2 | lt? 10]
produces the sequence [ 1, 3, 5, 7, 9]
Logical operators act as filters; the step operator applies a comb function; arithmetic operators apply the operation to each element passed to them. Example:
[1...5 | * 3 | + 1 ]
produces the sequence
[4, 7, 10, 13, 16]
The resemblence to unix pipes is not entirely accidental.
5.0.6 Concatenation
The caret (^) is the concatenation operator. It may either be used as an infix operator or as a function. For example, if
foo := "bagel bar := "dorf then blatifu := foo ^ bar blatifu := <^ foo bar>
are alternate ways to set the content of blatifu to bageldorf.
5.0.7 Three types of qualified names
There are three types of qualified names with three different separating characters, money sign ($), the period (.), and the exclamation point (!). San variables (morphs) have two types of fields, irregular, and regular. The money (dollar) sign is used to qualify irregular names, e.g., foo$nrow is the number of rows in the foo table. Likewise the period (dot) is used to to qualify regular fields, e.g., foo[i].a is the table entry for row i, column a.
The exclamation point (bang) is specialized. The second name is taken as a procedure that accepts the first name as an argument. The fields of the procedure that would be set as the result of the call are set in the argument instead. For example, suppose procedure mysort sorts the array of its argument and then sets its own array to be the sorted array. We could use the sort procedure as follows:
sort data data[] <- sort[] However we can do the same thing more compactly with data!sort Bang separators can be concatenated. Thus data!alpha!beta!gamma!delta is equivalent to alpha data beta alpha gamma beta delta gamma data[] <- delta[]
Bang expressions take no argument. The regular fields of the final procedure are transferred back to the original morph.
5.0.8 Null function
A pair of angle brackets without enclosed content, e.g., <>, is a null function. Thus the code
x := <>
sets the value of x to be empty.
6.0 Delimited executable code aggregates (routines)
The term, delimited executable code aggregate, is certainly clumsy, but I don’t know of a good alternative. Terms such as “procedure” and “function” have narrow connotations due to their use in mainstream programming languages. For the sake of convenience I will use the term “routine”.
What, then, is a routine?
A routine is a block of code that can be bound to an associated identifier, and that can accept control (can be invoked, activated, or equivalent). Routines include functions, procedures, coroutines, tasks, event handlers, data flow components, and goto acceptance blocks.
Routines can be distinguished between those that are activated with the call/return scheme (traditional functions and procedures) and those event activated (event handlers, exception handlers, data flow elements). Likewise they can be distinguished between those that can be resumed and those that cannot. Finally, we can distinguish between those with persistent data and those without (reentrent and non-reentrent.)
San recognizes four kinds of routines – functions, procedures, sequent (list) operators, and agents.
6.1 Functions and procedures
Functions and procedures are the work horses of procedural languages. The two forms are closely related, and many languages fuzz the two together.
In practice the two forms are distinct; functions return values that can appear in expressions whereas procedures do not. An example of a function invocation is
x := <cos theta> <# San syntax> x = cos(theta); /* C syntax */ An example of a procedure invocation is sort data <# San syntax > sort(data); /* C syntax */
In San functions cannot alter variables in their calling environment, i.e., function invocations use call by copy. On the other hand procedures can alter variables in their calling environment, i.e., procedure invocations use call by reference. Having two different modes of argument semantics is a possible source of confusion; however call by reference (or call by address) is a practical necessity for procedures.
6.1.1 Handling of arguments in procedures and functions
When procedures and functions are defined the arguments are specified in an arguments statement. Arguments can have default values specified in init statements. For example:
begin procedure foo arguments x, y init x = 3.14159 init y = 180. ... end foo
There are two distinct ways to supply arguments when invoking procedures and functions. The two methods may not be mixed. The first method is positional correspondence. For example
foo a b
The second method is to use name=value pairs, e.g.,
foo x=a, y=b
When the name=value method is used order need not be preserved and arguments can be omitted, e.g.,
foo y=b
The omitted arguments have the default values specified in the init statements.
6.1.2 Creation of procedures and functions
Procedures and functions are defined using proc blocks. For example:
begin proc factorial arguments n n le? 1 => return 1 else => return n * <factorial n - 1> end proc
A proc can either be used as a procedure or as a function.
6.2 Dataflow programming
In dataflow progamming programs are views as being a collection of black boxes (data flow elements) connected by pipes (streams). A data flow element is a routine with 0 or more input ports and 0 or more output ports. Input ports accept data from input streams; output ports emit data to output streams. Data flow elements are activated whenever there is input on an input port. Block markers to delimit blocks can be accepted and emitted.
Data flow elements that have outputs but no inputs are called sources; sources include external devices, external files, and generators such as random number generators.
Data flow elements that have inputs but no outputs are called sinks; sinks include external devices, e.g., printers, and files that are written to.
In San input ports are labelled $0, $1, etc, and output ports are labelled $out0, $out1, etc. The current value in an input port is referenced by the port name, e.g., $0. The content of remainder of a block is referenced by appending brackets, e.g., $0[].
Emitting the contents of an input port advances the input. The emit0 command sends data to $out0; emit1 sends data to $out1, etc. Likewise emitting the content of a morph clears the content.
In San streams can be blocked. The block separators, called marks, are not special elements in streams; they must be emitted and detected using separate commands distinct from the commands used to read and write streams.
6.3 Sequent operators
Sequent operators are special cases of data flow elements that satisfy these conditions:
There are two input ports, $0 and $1, and one output port $out0. Inputs are presumed to be blocked. When invoked in a sequent $in0 comes from the piped input and $in1 comes from the argument list. $out0 is sent to the output pipe.
Here are examples of sequent operator programming. The first routine is an implementation of quick sort; the second routine is an implementation of merge sort.
begin operator qsort pivot := $0 small[] := [$0[] | lt? pivot | qsort] big[] := [$0[] | gt? pivot | qsort] same[] := [$0[] | eq? pivot ] emit0 [small[], same[], big[]] end qsort begin operator msort begin operator merge begin switch <empty? $0> => emit $1[] <empty? $1> => emit $0[] $0 lt? $1 => emit $0 else => emit $1 end end merge cut $0[] $0.$nrow/2 emit [[cut.head[] | msort]| merge [cut.tail[] | msort]] end msort

6.4 Agents
An agent is an executable code element (routine) that is an independent thread of control, i.e., a mini-thread. Procedures (and functions) are invoked. Agents cannot be invoked; they can be sent messages, receive input on input ports, handle events and exceptions, and be activated as programs or tasks.
Agents have two types of code – initialization (setting of parameters) and response blocks (on codes). Response blocks are activated when a specific condition arises. Response block initiators have the form on-TYPE where TYPE is the type of event being handled. The types include:
on-activation executed then the agent is first activated. on-$N-MODE executed when there is input on channel N. MODE specified whether the channel is being read in char, word, or word list mode. on-markN executed when there is a blocking mark received on channel N on-exception executed when an exception is raised within the agent. An on-exception block with arguments handles exceptions names in the argument list; a block without arguments handles all exceptions. on-msg executed for each received message.

7.1 File Structure
A San file is divided into segments, consisting of a segment statement followed by an indented body. The segment statement has two arguments, the type, and the label. There are at least three types of segments, executives, configurators, and modules
Modules contain actual code. In addition to executable code within a module, there may also be import and export statements. Export statements make the exported items visible outside the module. Import statements specify required external resources. The names in inport statements are the names used locally within the module; connections between the module local names and the resources matched to the local name are specified in configurators.
The executive specifies which agent will be used as main program and which configurator will be used to specify the configuration. The program name can be a symbolic name; the selected configurator translates the symbolic name into an actual path, module, and named agent. Symbolic names can be used within configurators, and a configurator can reference another configurator.
7.2 Sample Program: A simple desk calculator
The following code would be the contents of a file that implements a simple command line desk calculator. It illustrates the use of agents. Each line of input updates a running total; the first token in the line is an arithmetic operator, with the remaining tokens being number. For each number, the total is updated by applying the operator and the number to the total. A ‘C’ resets the total to 0. For example:
calc: + 2 3 5 10 calc: * 4 9 360 calc: / 15 24 calc: C 0 calc: begin segment type = executive, label = exec program = desk-calculator configurator = config end exec begin segment type = configurator, label=config desk-calculator = <segpath . deskcalc desk-calculator> end config begin segment type = module, label = deskcalc init clear := 'C init ops[] := ['+, '-, '*, '/, clear] init wsp[] := [t , sp] begin agent desk-calculator msg to=print type=prompt $0 | getline | lex | validate | calc end begin agent getline on-in0-char begin switch $0 ceq? n => mark0 else => emit0-char $0 end end begin agent lex on-in0-char begin switch $0 in? wsp[] => emit0-word word else => word := {word}{$0} end on-mark0 mark0 end begin agent validate init okay := true init new := true on-in0-word begin switch $0 ~ okay => <> new => begin begin switch $0 nin? ops[] => raise bad-command $0 else => command := $0 end new := false end ~ number? => raise not-a-number $0 ne? 0 => append $0 to args command ceq? '/ => raise divide-by-zero else => append $0 to args end on-mark0 begin okay => emit0-list command, args[] okay := true new := true end on-exception begin begin switch $exception bad-command => begin msg to=print type=string {S.{$arg} is not a command} end divide-by-zero => begin msg to=print type=string {S.Can't divide by zero} end not-a-number => begin msg to=print type=string {S.{$arg} is not a number} end end msg to=print type=prompt okay := false end begin agent calc init total 0 on-in0 begin $0[] -> command, args[] begin switch command ceq? clear => total := 0 else => begin loop value from args[] total := total {command} value end end msg to=print type=string {total} end end begin agent print on-msg begin switch $type ceq? string => puts {$arg}{n} ceq? prompt => puts {"calc: } end end end deskcalc

This page was last updated September 12, 2003.

home

table of contents

Computer Science

September 2003

email