|
CHAPTER 12. CONTROL STRUCTURES
Most definitions in the Forth dictionary are defined by the colon ':' word. They are called colon definitions, Forth definitions, or high level definitions. When the text interpreter sees the word ':', it creates a header using the text string following colon as the name and then enters the compiling state. In the compiling state, the text interpreter reads in a text line from the input stream, parses out strings delimited by blanks, and tries to match them with dictionary entries. If a string matches with a dictionary entry, the code field address of the matching word is added to the parameter field of the new definition under construction. This is what we call the compiling process. The compiling process ends when the terminating word ; or ;CODE is encountered. When a colon definition is later executed, the word addresses in its parameter field are executed by the address interpreter in the order as compiled. If it is necessary to alter the sequential execution process at run-time, special word has to be used in the compiling process to set up the mechanism of branching and looping, to build the control structures and the program constructs in the colon definition. These special words are equivalent to compiler directives or assembly directives in conventional computer languages. These words do not become part of the compiled definition, but cause specific actions during compilation to build the control structure into the definition and to ensure its correct execution at run-time. These special words in Forth are characterized by the fact that they all have a precedence bit in the length byte of the name field set to one. Words with precedence bit set are called immediate words because the text interpreter turns these words over to the address interpreter for execution even during compilation. In this Chapter, we shall concern ourselves with the means by which the following control structures are built in a colon definition:
IF -- ELSE -- ENDIF BEGIN -- UNTIL BEGIN -- WHILE -- REPEAT and DO -- I -- LEAVE -- LOOP
However, before discussing the detailed definitions of these words, a few utility words should be presented to make the discussions more intelligible. The word COMPILE and [COMPILE] are used to handle special compiling situations. The words BRANCH and 0BRANCH are the actual words which get compiled into the definition to do the branching and looping. Words in a colon definition are normally compiled into dictionary and their code field address are compiled into the parameter field of the colon definition under compilation. Sometimes the compilation should be delayed to the run-time, i. e., the word is to be compiled not when the colon definition is being compiled, but when the colon definition is later executed. To defer compilation until run-time, the instruction COMPILE must precede the word.
: COMPILE --
Defer compilation until run-time. When the word containing COMPILE is executed, the code field address of the word following COMPILE is compiled into the dictionary at run-time.
?COMP Error if not compiling. R> Top of return stack is pointing to the next word following COMPILE . DUP 2+ >R Increment this pointer by 2 to point to the second word ollowing COMPILE , which will be the next word to be executed. The word immediately following COMPILE should be compiled, not executed. @ , Do the compilation at run-time. ;
Immediate words, because of their precedence bits, are executed during compilation. However, if one wanted to use the word sequence in an immediate word as a regular colon definition, i. e. to compile it in-line, the word [COMPILE] can be used to force the following immediate word to be compiled into a definition. The word [COMPILE] is used in the form : xxxx --- [COMPILE] cccc --- ; in which cccc is the name of an immediate word. ¡@ : [COMPILE] -- ¡@ Force the compilation of the following immediate word. ¡@ -FIND Accept next text string and search dictionary for a match. 0= 0 ?ERROR No matching entry was found. Issue an error message. DROP Discard the length byte of the found name. CFA , Convert the name field address to code field address and compile it into the dictionary. ; IMMEDIATE
The two words changing execution sequence in a colon definition are BRANCH and 0BRANCH, both are primitive code definitions. They are of such importance that I feel they should be treated fully. The codes are from PDP-11 fig-Forth.
CODE BRANCH --
The run-time procedure to branch unconditionally. An in-line offset is added to the interpretive pointer IP to branch forward or backward. BRANCH is compiled by ELSE, AGAIN, and REPEAT.
ADD (IP),IP Add the contents of the next cell pointed to by IP to IP itself. The result is put back to IP which points to the next word to be executed. The next word can be out of the regular execution order. NEXT Return to the word pointed to by IP , completing the unconditional branching. ¡@ CODE 0BRANCH f --
The run-time procedure to branch conditionally. If f on stack is false (zero), the following in-line offset is added to IP to branch forward or backward. Compiled by IF, UNTIL, and WHILE.
TST (S)+ Test the flag f on stack. BNE ZBRA1 Not zero, continue executing next word by skipping the offset. ADD (IP),IP f is zero, do the branching. NEXT
ZBRA1: A common routine shared with LOOP. ADD #2,IP f is true, skip the in-line offset. Pick up the word following the offset and continue execution. NEXT ¡@ Conditional branching in a colon definition uses the forms:
IF (true part) --- ENDIF or IF (true part) --- ELSE (false part) --- ENDIF
At run-time, IF selects to execute the true part of words immediately following it, if the top item on data stack is true (non-zero). If the flag is false (zero), the true part will be skipped to after ELSE to execute the false part. After executing either part, execution resumes after ENDIF . ELSE and the false part are optional. If ELSE part is missing, execution skips to just after ENDIF .
: IF At run-time f -- Compile time -- addr n
It compiles 0BRANCH and reserves one more cell for an offset value at addr . addr will be used later to resolve the offset value for branching. n is set to 2 for error checking when ELSE or ENDIF is later compiled.
COMPILE 0BRANCH Compile the code field address of the run-time routine 0BRANCH into the dictionary when IF is executed. HERE Push dictionary address on stack to be used by ELSE or ENDIF to calculate branching offset. 0 , Compile a dummy zero here, later it is to be replaced by an offset value used by 0BRANCH to compute the next word address. 2 Error checking number. ; IMMEDIATE IF in a colon definition must be executed, not compiled.
: ENDIF Compile time addr n --
Compute the forward branching offset from addr to HERE and store it at addr . Test n to match the previous IF or ELSE in the definition.
?COMP Issue an error message if not compiling. 2 ?PAIRS ENDIF must be paired with IF or ELSE . If n is not 2, the structure was disturbed or improperly nested. Issue an error message. HERE Push the current dictionary address to stack. OVER - HERE-addr is the forward branching offset. SWAP ! Store the offset in addr , thus completing the IF-ENDIF or IF-ELSE-ENDIF construct. ; IMMEDIATE
: ELSE Compile time addr1 n1 -- addr2 n2
Compile BRANCH and reserve a cell for forward branching offset. Resolve the pending forward branching from IF by computing the offset from addr1 to HERE and storing it at addr1 .
2 ?PAIRS Error checking for proper nesting. COMPILE BRANCH Compile BRANCH at run-time when ELSE is executed. HERE Push HERE on stack as addr2 . 0 , Dummy zero reserving a cell for branching to ENDIF . SWAP Move addr1 to top of stack. [COMPILE] ENDIF Call ENDIF to work on the offset for forward branching. ENDIF is an immediate word. To compile it the word [COMPILE] must be used. 2 Leave n2 on stack for error checking. ; IMMEDIATE
Indefinite loops are to be constructed using the following forms:
BEGIN --- UNTIL or BEGIN --- WHILE --- REPEAT
BEGIN simply leaves the current dictionary address on stack for UNTIL or REPEAT to pickup and to compute a backward branching offset at the end of the loop. WHILE is similar to IF in that it skips to just after REPEAT if the flag on stack at that point isfalse, thus terminating the indefinite loop from inside the loop. UNTIL terminates the loop only at the end of the loop.
: BEGIN Compile time -- addr n
At compile time BEGIN leaves the dictionary address on stack with an error checking number n. It does not compile anything to the dictionary.
?COMP Issue an error message if not compiling. HERE Push dictionary pointer on stack to be used to compute backward branching offset. 1 Error checking number. ; IMMEDIATE ¡@ : BACK addr --
A run-time procedure computing the backward branching offset from HERE to addr on stack, and compile this offset value in the next in-line cell in the dictionary.
HERE - , addr-HERE, the backward branching offset. ; ¡@ : UNTIL Compile time addr n --
Compile 0BRANCH and an in-line offset from HERE to addr. Test the error checking code n. If not equal to 1, there is an error in the nesting structure.
1 ?PAIRS If n is not 1, issue an error message. COMPILE 0BRANCH Compile 0BRANCH at run-time. BACK Compute backward branching offset and compile the offset. ; IMMEDIATE
When the colon definition containing the BEGIN-UNTIL structure is executed, the word 0BRANCH compiled by UNTIL at the end of a loop will test the flag on stack at that instant. If the flag is false, 0BRANCH will branch back to the word following BEGIN. The words between BEGIN and UNTIL will be repeatedly executed until the flag is true at UNTIL; at this instant, the interpreter will abort this loop and continue executing the words following UNTIL.
: AGAIN compile time addr n --
Similar to UNTIL but compile BRANCH instead of 0BRANCH in the dictionary to construct an infinite loop. Execution cannot leave this loop unless the words R> DROP are executed in a word inside this loop.
1 ?PAIRS Error checking. COMPILE BRANCH Compile BRANCH and an offset to BEGIN . BACK ; IMMEDIATE
The construct BEGIN-WHILE-REPEAT uses WHILE to abort a loop in the middle of the loop. WHILE will test the flag left on stack at that point. If the flag is true, WHILE continues the execution of following words until REPEAT, which then branches unconditionally back to BEGIN. If the flag is false, WHILE causes execution to skip the words up to REPEAT, thus exiting the loop structure.
: WHILE Compile time addr1 n1 -- addr1 n1 addr2 n2
Compile 0BRANCH and a dummy offset for REPEAT to resolve. addr1 and n1 as left by BEGIN are also passed on to be processed by REPEAT.
[COMPILE] IF Call IF to compile 0BRANCH and the offset. 2+ Leave 4 as n2 to be checked by REPEAT . ; IMMEDIATE
: REPEAT Compile time addr1 n1 addr2 n2 -- Compile BRANCH to jump back to BEGIN. Resolve also the branching offset required by WHILE.
>R >R Get addr2 and n2 out of the way. [COMPILE] AGAIN Let AGAIN do the dirty work of compiling unconditional branch back to BEGIN . R> R> Restore addr2 and n2 . [COMPILE] ENDIF Use ENDIF to resolve the forward branching needed by WHILE . ; IMMEDIATE
The IF-ELSE-ENDIF and the BEGIN-UNTIL types of constructs simply redirect the execution sequence inside of a colon definition. As discussed previously, the definitions of these compiler directives are quite short and simple, involving only branching and conditional branching. The DO-LOOP type of construct is more complicated because additional mechanisms other than branching are needed to keep track of the loop limits and loop counts. The run-time functions of DO are to take the lower and upper loop limits off the data stack, push them on the return stack, and setup the address for LOOP to jump back. LOOP at run-time will then increment the loop count on top of the return stack and compare its value to that of the loop limit just under it on the return stack. If the loop count equals or exceeds the loop limit, the loop is completed and execution goes to the next word after LOOP. Otherwise, LOOP will branch back to DO and continue the looping. +LOOP behaves similarly to LOOP except that it increment the loop count by a number supplied on the data stack. The words DO, LOOP, and +LOOP call on their respective run-time routines to do the work. The detailed codes in these run-time routines will be discussed also. DO-LOOP's are set up in a colon definition in the following forms:
DO --- I --- LOOP or DO --- I --- +LOOP
At run-time, DO begins a sequence of repetitive executionscontrolled by a loop count and a loop limit. The starting value of the loop count and the loop limit are taken off the data stack at run time. Upon reaching the word LOOP, the loop count is incremented by one. Until the new loop count equals or exceeds the loop limit, execution loops back to the word just after DO. Otherwise, the two loop parameters are removed from the return stack and the execution continues ahead at the word after LOOP. Within a loop, the word I will copy the loop count to data stack to be used in computations.
: DO Run-time n1 n2 -- Compile time -- addr n ¡@ COMPILE (DO) Compile the run-time routine address of (DO) into dictionary. HERE Address addr for backward branching from LOOP or +LOOP. 3 Number for error checking. ; IMMEDIATE ¡@ CODE (DO) n1 n2 --
The run-time routine starting a DO-LOOP. n1 and n2 are pushed on the return stack as loop limit and loop index, respectively.
MOV 2(S),-(RP) Push the loop limit n1 on return stack. MOV (S),-(RP) Push the initial loop count n2 on return stack above n1 . ADD #4,S Adjust the stack pointer to drop n1 and n2 off the data stack. NEXT Return. ¡@ CODE I -- n
Return the current loop index inside a DO-LOOP.
MOV (RP),-(S) Copy the loop count on return stack and push it to data stack. NEXT ¡@ CODE LEAVE --
Make the loop limit equal to the loop count and force the loop to terminate at LOOP or +LOOP .
MOV (RP),2(RP) Copy loop count to loop limit on the return stack. NEXT
: LOOP addr n --
Terminate a DO-LOOP structure in a colon definition.
3 ?PAIRS Check the number left by DO . If it is not 3, issue an error message. The loop is not properly nested. COMPLIE (LOOP) Compile (LOOP) at run-time when LOOP is executed. BACK Compute and compile the backward branch offset. ; IMMEDIATE ¡@ CODE (LOOP) -- ¡@ Run-time routine of LOOP . ¡@ INC (RP) Increment the loop count on return stack. CMP (RP),2(RP) Compare loop count with the loop limit. BGE LOOP1 Jump to LOOP1 if the loop count is equal or greater than the loop limit. ADD (IP),IP Add backward branch offset to IP and NEXT branch back to repeat the DO-LOOP. ¡@ LOOP1: ADD #4,RP Exit the loop. Discard the loop parameters off the return stack. ADD #2,IP Advance IP over the in-line offset number and NEXT continue executing the next word after LOOP .
When the loop count must be incremented by an amount other than one, +LOOP should be used to close a DO-LOOP . It is used in the form:
DO --- I --- +LOOP ¡@ : +LOOP Run-time n1 -- Compile time addr n1 -- ¡@ Increment the loop index by n1 on the stack and test for loop completion. Branch back to DO if not yet done.
3 ?PAIRS Check n. If it is not 3 as left by DO , issue an error message. COMPILE (+LOOP) Compile the address of (+LOOP) at run-time when the colon definition is being built. BACK Compile back branch offset. ; IMMEDIATE ¡@ CODE (+LOOP) n -- ¡@ Run-time routine at the end of a DO--+LOOP loop. ¡@ ADD (S),(RP) Add n to the loop count on return stack. TST (S)+ Test and pop data stack BLT LOOP3 If n is negative, jump to LOOP3 for special processing. CMP 2(RP),(RP) n is positive. Compare loop count with loop limit. BLE LOOP2 If the loop is done, jump to LOOP2 to exit. ADD (IP),IP Not yet done, return to DO . NEXT ¡@ LOOP2: ADD #4,RP Clear return stack. ADD #2,IP Advance IP to the next word after +LOOP . NEXT ¡@ LOOP3: CMP (RP),2(RP) Negative increment n . Reverse comparison. BLE LOOP2 ADD (IP),IP Not yet done with the loop. Return to the word after DO . NEXT
¡@ |