易符智慧科技 eforth technology

CHAPTER 4. THE ADDRESS INTERPRETER

The function of the text or outer interpreter is to parse the text from the input stream, to search the dictionary for the word parsed out, and to handle numeric conversions if dictionary searches failed. When a matching entry is found, the text interpreter compiles its code field address into the dictionary, if it is in a state of compilation. However, if it is in state of execution and the entry is of the immediate type, the text interpreter just leaves the code field address on the data stack and calls on the address interpreter to do the real work. The address interpreter works on the machine level in the host computer, hence it is often referred to as the inner interpreter.

If a word to be executed is a high level Forth definition or a colon definition, which has a list of code field addresses in its parameter field, the address interpreter will properly interpret these addresses and execute them in sequence. Hence the name address interpreter. The address interpreter uses the return stack to dig through many levels of nested colon definitions until it finds a code definition in the Forth nucleus. This code definition consisting of machine codes is then executed by the CPU. At the end of the code definition, a jump to NEXT instruction is executed, where NEXT is a run-time procedure returning control to the address interpreter, which will execute the next definition in the list of execution addresses. This process goes on and on until every word involved in every nesting level is executed. Finally the control is returned back to the text interpreter.

The return stack allows colon definitions to be nested indefinitely, and to correctly unnest them after the primitive code definitions are executed. The address interpreter with an independent return stack thus very significantly contributes to the hierarchical structure in the Forth language which spans from the lowest machine codes to the highest possible construct with a uniform and consistent syntax.

To diskuss the mechanisms involved in the address interpreter, it is necessary to touch upon the host CPU and its instruction set on which the Forth virtual computer is constructed. Here I have chosen to use the PDP-11 instruction set as the vehicle. The PDP-11 is a stack oriented CPU, sharing many characteristics with the Forth virtual machine. All the registers have predecrementing and postincrementing facilities very convenient to implement the stacks in Forth. The assembly codes using the PDP-11 instructions thus allow the very concise and precise definition of functions performed by the address interpreter.

The Forth virtual machine uses four PDP-11 registers for stacks and address interpretation. These registers are named as follows:

SP Data stack pointer

RP Return stack pointer

IP Interpretive pointer

W Current word pointer

The data stack pointer and the return stack pointer point to the top of their respective stacks. The familiar stack operators like DUP, OVER, DROP, etc and arithmetic operators modify the contents as well as the number of items on the data stack. However, the user normally does not have access to the interpretive pointer IP nor the word pointer W . IP and W are tools used by the address interpreter.

The word NEXT is a run-time routine of the address interpreter. IP usually points to the next word to be executed in a colon definition. After the current word is executed, the contents of IP is moved into W and now IP is incremented, pointing to the next word to be executed. Now, W contains the address of the current word to be executed, and an indirect jump to the address in W starts the execution process of this word. In the mean time, W is also incremented to point to the parameter field address of the word being executed. All code definitions ends with the routine NEXT, which allows the next word after this code definition to be pulled in and executed.

In PDP-11 figForth, NEXT is defined as a macro rather than an independent routine. This macro is expanded at the end of all code definitions.

MOV (IP)+,W Move the content of IP, which points to the next wordto be executed, into W . Increment IP , pointing to the second word in execution sequence.

JMP @(W)+ Jump indirect to code field address of the next word.
Increment W so it points to the parameter field of this word.
After the jump, the run-time routine pointed to by the code field of this word will be executed.

If the first word in the called word is also a colon definition, one more level of nesting will be entered. If the next word is a code definition, its code field contains the address of its parameter field, i.e., the code field address plus 2. Here, JMP @(W)+ will execute the codes in the parameter field as machine instructions. Thus the code field in a word determines how this word is to be interpreted by the address interpreter.

To initiate the address interpreter, a word EXECUTE takes the address on the data stack, which contains the code field address of the word to be executed, and jump indirectly to the routine pointed to by the code field.

CODE EXECUTE cfa --

Execute the definition whose code field address cfa is on the datastack.

MOV (S)+,W Pop the code field address into W , the word pointer

JMP @(W)+ Jump indirectly to the code routine. Increment W to point to the parameter field.

In most colon definitions, the code field contains the address of a run-time routine called DOCOL, meaning 'DO the COLon routine', which is the 'address interpreter' for colon definitions.

DOCOL: Run-time routine for all colon definitions.

MOV IP,-(RP) Push the address of the next word to the return stack and enter a lower nesting level.

MOV W,IP Move the parameter field address into IP , pointing to the first word in this definition.

MOV (IP)+,W

JMP @(W)+ These two instructions are the macro NEXT .
The old IP was saved on return stack and the new IP is pointing to the word to be executed. NEXT will bring about the proper actions .

Using the interpretive pointer IP alone would only allow the processing of a address list at a single level. The return stack is used as an extension of IP. When a colon definition calls other colon definitions, the contents of IP are saved on the return stack so that the IP can be used to call other definitions in the called colon definition. DOCOL thus provides the mechanism to nest indefinitely within colon definitions.

At the end of a colon definition, execution must be returned to the calling definition. The analogy of NEXT in colon definitions is a word named ;S, which does the unnesting.

CODE ;S --

Return execution to the calling definition. Unnest one level.

MOV (RP)+,IP Pop the return stack into IP , pointing now to the next word to be executed in the calling definition.

NEXT Go ahead executed the word pointed to by IP .
We shall not repeat the definition of NEXT which is MOV (IP)+,W JMP @(W)+ .

The interplay among the four registers, IP , W , RP , and S allows the colon definitions to nest and to unnest correctly to an indefinite depth, limited only by the size of the return stack allocated in the system. This process of nesting and unnesting is a major contributor to the compactness of the Forth language. The overhead of a subroutine call in Forth is only two bytes, identifying the address of the called subroutine.

A few variations of NEXT are often defined in figForth for many microprocessors as endings of code definitions. PDP-11 figForth did not use them because of the versatility of the PDP-11 instruction set. Nevertheless, these endings are presented here in PDP codes for completeness and consistency.

PUSH: --

Push the contents of the accumulator to the data stack and return to NEXT .

MOV 0,-(S) Push 0 register to data stack

POP: --

TST (S)+ Diskard the top item of data stack

NEXT Return

PUT: --

Replace the top of data stack with the contents of the accumulator, here register 0, and NEXT return.

MOV 0,(S)

LIT: --

Push the next word to the data stack as a literal. Increment IP and skip this literal. NEXT Return. LIT is used to compile numbers into the dictionary. At run-time, LIT pushes/ the in-line literal to the data stack to be used in computations.

MOV (IP)+,-(S)