CHAPTER III

CHAPTER III. THE cmFORTH OPERATING SYSTEM

This version of Forth, cmFORTH, was developed by Chuck Moore, the chief architect of the Novix-4000 Forth engine and the inventer of the Forth language. It is used in Chuck's Gamma Board and in Software Composers' SC1000C single board computer as their operating system. Chuck Moore and Novix kindly donated this remarkable software package into the public domain to encourage users exploring the phenominal capabilities of the NC4000 chip.

This chapter is intended to be the documentation for cmFORTH. I will attempt to go through the system in minute detail, in order to help those who are unfamiliar with the Forth language as well as those who are not familiar with Chuck Moore's coding style. In any case, the source listing itself is the primary documentation and the descriptions in this chapter are at best commentary to the source listing. The complete source listing of cmFORTH is included in this book as Appendix A. A program to burn target compiled object code into EPROM's is in Appendix B. A short program which must be implemented in a host computer to act as a terminal/disk server for the NC4000 computer is in Appendix C.

The original source listing of cmFORTH is your best source of code examples and programming style, when striving to use the NC4000 chip most effectively. You are encouraged to study this listing very carefully in order to gain maximum benefit from the chip. The source code is only 30 screens long. It would not be very difficult to study it and to be thoroughly familiar with it.

There are many important innovations in cmFORTH worthy of your special attention. First is the optimizing compiler which takes normal Forth source code and compiles very short and efficient NC4000 machine instructions. Next is the dual threaded vocabulary structure in which all the compiler directives and smart compiling/assembling words are linked together, separate from the regular FORTH vocabulary. The interpreter searches only the FORTH vocabulary. The compiler first searches the COMPILER vocabulary and then the FORTH vocabulary. This searching method eliminates the necessity of immediate words. Lastly, a very simple linking method is used to compile words into a target dictionary, which can later be isolated from the main dictionary and burned into EPROM. These innovations in time will make significant impact in the Forth community, leading to better and more efficient Forth systems and programs.

.new

1. The Kernel

The kernel in a Forth system is a collection of low level instructions which drive the computer and are used to construct other high level instructions. In cmFORTH, the kernel contains primarily simple NC4000 machine instructions. However, many of the commonly used Forth words do not have corresponding single word NC4000 instructions and they will have to be synthesized from the primitive NC4000 instructions.

1.1. The primitive Forth Words

Primitive stack operators are the machine instructions of the NC4000 chip. However, each machine instruction must have two versions, one to be executed by the text interpreter and the other for the compiler. The executable version must be a colon defintion with names so that they can be found by the text interpreter. They can be compiled into other colon definitions, but the resulting code will be terribly inefficient because of the overhead in nesting and unnesting. The other version used by the compiler is smart. It generates optimized machine code whenever feasible, taking advantage of the unique property of NC4000 in combining many Forth words into a single machine instructions.

The following primitive Forth words are redefined so that they can be executed by the text interpreter.

: SWAP SWAP ;

: OVER OVER ;

: DUP DUP ;

: DROP DROP ;

: XOR XOR ;

: AND AND ;

: OR OR ;

: + + ;

: - - ;

: 0< 0< ;

: NEGATE NEGATE ;

: @ @ ;

: ! ! ;

Many other commonly used Forth words cannot be reduced to single NC4000 instructions, so they have to be constructed with several NC4000 machine instructions.

: ROT ( n1 n2 n3 -- n2 n3 n1 )

>R SWAP Exchange n1 and n2.

R> SWAP Exchange n1 and n3.

;

: 0= ( n -- f )

IF 0 EXIT THEN Return false if not 0.

EXIT is cheaper and faster.

-1 Can be obtained from a register.

;

: NOT ( n -- f )

0= Logic NOT, not one's complement.

;

: < ( n1 n2 -- f )

- 0<

;

: > ( n1 n2 -- f )

SWAP- A NC4000 primitive instruction.

;

: = ( n1 n2 -- f )

XOR 0=

;

: U< ( u1 u2 -- f )

- 2/ Get the carry of subtraction.

0< Return proper flag.

;

: ?DUP ( n -- n n | 0 )

DUP

IF DUP EXIT THEN EXIT is faster.

;

: WITHIN ( n low high -- f )

OVER - >R high - low

- n - low

R> U< In range?

;

: ABS ( n -- u )

DUP

0< Negative?

IF NEGATE EXIT THEN Invert negative number.

;

: MAX ( n1 n2 -- n1 | n2 )

OVER OVER - n1 - n2

0< Compare.

IF SWAP-DROP EXIT n1 < n2, drop n1.

THEN DROP Otherwise, drop n2.

;

: MIN ( n1 n2 -- n1 | n2 )

OVER OVER - n1 - n2

IF DROP EXIT n1 > n2, drop n2.

THEN SWAP-DROP Otherwise, drop n1.

;

The funny IF-BEGIN... UNTIL-THEN structure in Screen 11, connecting the two definitions MAX and MIN, in effect achieves the above function with a net saving of four instructions. You can do it because cmFORTH does not have compiler security and protection. Not recommended for general programming practice.

: 2DUP ( d -- d d )

OVER OVER

;

: 2DROP ( d -- )

DROP DROP

;

1.2. Memory Accessing Words

: +! ( n a -- )

0 @+ Fetch from a, while keeping a

on the stack.

>R Save a.

+ Add n to contents of a.

R> ! Store the sum back into a.

;

: 2C@+ ( a -- a+1 l h )

1 @+ Fetch a cell and increment a.

SWAP DUP Get the contents just fetched.

127 AND Isolated the low byte.

SWAP 6 TIMES 2/ Right justify the high byte.

;

: 2/MOD ( n -- rem quot )

Equivalent to 2 /MOD but faster.

Needed to convert byte address

to cell address.

DUP 1 AND Get the remainder.

SWAP 2/ Shift left to get quotient.

;

: C! ( b a -- )

a is byte address, which has to

be converted to cell address.

2/MOD DUP >R Save cell address.

@ Cell contents.

SWAP Byte offset.

IF -256 AND Offset=1. Mask off lower byte.

ELSE 255 AND Offset=0. Mask off higher byte.

SWAP Get the byte b.

6 TIMES 2* Shift left by 8 bits.

THEN

XOR Combine two bytes.

R> ! Put the cell back.

;

: C@ ( a -- b )

2/MOD Get the cell address.

@ Contents of the cell.

SWAP 1 - Offset=1 ?

IF 6 TIMES 2/ THEN Yes. Shift right by 8 bits.

255 AND Mask off the high byte.

;

: 2@ ( a -- d )

1 @+ Get the most significant cell.

@ Get the least significant cell.

SWAP Put them in correct order.

;

: 2! ( d a -- )

1 !+ Store the most significant cell.

! Store the next cell.

;

NC4000 is a 16 bit machine and it can access memory only by 16 bit cells. Two bytes are packed into one cell, with the first byte in the higher half (MSB) of the cell. The byte address is twice that of the cell address, with the least significant bit as the byte offset in a cell. To access one byte in the memory, one has to convert the byte address to a cell address by 2/MOD and use the quotient as an offset to find the requested byte. It takes lots of extra work to do byte addressing. Avoid it at all cost.

If you really have to get bytes from the memory, the right word to use is 2C@+ which fetches one cell from the memory and returns both bytes on the stack. The address is incremented by 1 and preserved as the third element on the stack so you can fetch the next two bytes. It is faster than C@ and much more powerful.

: MOVE ( a1 a2 n -- )

a1 is the starting address of the

source, and a2 is the end address

of destination string. Be careful!

>R Save cell count.

MD I! Save a2 in MD register.

I TIMES 1 @+ Fetch n cells to the data stack.

MD I@! Retrieve a2.

R> Retrieve cell count.

TIMES 1 !- Pop cells to destination in

reverse order.

DROP Discard last address.

;

: FILL ( a # n -- )

Fill a memory range with cell value n.

SWAP 1 - >R Push n-1 on return stack as count.

SWAP ( n a -- )

BEGIN

OVER SWAP ( n n a -- )

1 !+ Non-destructive store with a incremented.

2DROP Clear stack.

;

: ERASE ( a # -- )

Zero a range of cells.

0 FILL

;

1.3. Multiply and Divide

NC4000 does not have single instruction multiply or divide, which requires a lot of gates to implement. What is provided are multiply steps, divide steps, and square-root steps, which can be used repetitively to achieve the desired results. Problems in processing the carry bit in the prototype chip cause some restrictions in multiplication. The software fixes to perform correctly the proper function are not implemented. You have to work around these bugs. An example is included in Chapter IV.

OCTAL

: U*+ ( u1 r u2 -- d )

Unsigned integers u1 and u2 are

multiplied and added to r. The

product is an unsigned double

integer on the stack.

Warning: u2 must be even!

MD I! Store u2 in MD register.

16 TIMES *' Repeat multiply step instruction

*' 16 times and the product is

left on the stack.

;

: M/MOD ( ud u -- q r )

Unsigned double integer ud is divided

by unsigned integer u. Both quotient

and remainder are left on the

stack. Note the order of q and

r is not standard.

MD I! Store u in MD register.

D2* Left shift d by 1 bit so that

it is always even.

15 TIMES /' Repeat divide step /' 15 times.

/'' Last divide step.

;

: -M/MOD ( d u -- q r )

Double integer d is divided by

unsigned integer u.

OVER 0< Is d negative?

DUP >R Save u.

+ Add u to the higher half of d

THEN for floored division.

M/MOD Do the divide now.

;

: M/ ( d u -- q )

Mixed mode divide.

-M/MOD DROP Discard remainder.

;

: M*+ ( u1 0 u2 -- d )

Unsigned multiply.

MD I! Copy u2 to MD register.

13 TIMES *' Repeat multiply step.

*- Last signed multiply step.

;

: VNEGATE ( n1 n2 -- n3 n4 )

Negate top two integers on the

stack.

NEGATE Negate top integer.

SWAP NEGATE SWAP Negate next integer.

;

: M* ( n1 n2 -- d )

Mixed mode multiplication of two

signed integers.

DUP 0< Is n2 negative?

IF VNEGATE THEN If so, negate both integers.

0 SWAP initialize accumulator.

M*+ Do the multiplication.

;

: /MOD ( u1 u2 -- r q )

Divide unsigned integers and return

both remainder and quotient.

0 SWAP Insert 0, making dividend a double

integer.

M/MOD Do the mixed mode divide.

SWAP Correct the order of results.

;

: MOD ( u1 u2 -- r )

Find remainder of unsigned integer

division.

/MOD Do the generalized divide.

DROP Discard quotient.

;

: */MOD ( u1 u2 u3 -- r q )

Multiply u1 and u2. Divide the

double integer product by u3.

Return both remainder and quotient.

>R Save divisor u3.

0 SWAP U*+ Multiply u1 and u2.

R> M/MOD Divide by u3.

SWAP Correct the order of r and q.

;

: */ ( n1 n2 u -- r )

Ratio of n1*n2/u.

>R Save u.

M* Signed multiply of n1 and n2.

R> M/ Divide by u.

;

: * ( n1 n2 -- r )

Signed multiply.

0 SWAP U*+ Signed multiply.

DROP Discard high order cell.

;

: / ( n u -- q )

Divide by unsigned integer.

>R Save divisor u.

DUP 0< Extend the sign of n.

R> M/ Divide.

;

.new

2. System Variables

System variables contain vital information needed by the Forth system to function. Most of them are pointers to various areas in the Forth system, such as the top of the dictionary, the disk buffers, the terminal input buffer, the vocabulary threads, etc. System variables in this implementation are kept at the bottom of RAM space, starting from location 16. Thus the first 16 system variables are in the so called local memory, which can be accessed by single cell instructions. These are the most frequently used system variables. Less frequently used variables are kept above location 35.

Following is the list of system variables defined in this implementation, their memory locations, their initial values if initialized, and their function.

PREV Memory 16, not initialized.

Pointer to the most recently

referenced disk buffer.

OLDEST Memory 17, not initialized.

Pointer to the oldest loaded

disk buffer.

BUFFERS Memory 18 and 19, not initialized.

Storing block numbers of blocks

in the disk buffer.

NB A constant of value 1.

Number of disk buffers less 1.

BASE Memory 20, initialized to 10.

Number base for numeric I/O

conversion.

CNT Memory 21, not initialized.

Count of characters received from

the terminal device.

>IN Memory 22, not initialized.

Pointer to the input stream of

characters. Used by WORD to parse

strings.

BLK Memory 23, initialized to 0.

Contains the block number under

interpretation.

?CODE Memory 24, initialized to 0.

Storing the address of the machine

code most recently compiled.

Used by the optimizing compiler

to construction multi-function

instructions.

dA Memory 25, initialized to 0.

Memory address offset to be subtracted

from the current address so that the

dictionary compiled can be relocated

to another part of memory.

MSG Memory 26, initialized to the

end of the system variable area.

Pointer to the terminal input

buffer.

CURSOR Memory 27, initialized to 0.

Pointer to the memory location where

the last input character is stored.

WIDTH Memory 28, initialized to 2.

Cells in the name field of an

entry in the dictionary.

OFFSET Memory 29, initialized to 0.

Block number which became the

logic 0 block during disk access.

H Memory 30, initialized to 64 cells

after terminal input buffer MSG.

Pointer to the top of the current

dictionary.

C/B Memory 31, initialized to 417.

Machine cycles equivalent to the

width of a bit riding the RS-232

terminal port.

Interrupt Vector Memory 32 to 33, not initialized.

Machine instructions are stored

here to handle interrupt requests.

Thread Table Memory 34 and 35, pointers to the

ends of 2 threads in the dictionary.

The dictionary and vocabularies

are hashed into 2 threads. The

name field addresses at the end

of each thread are stored in this

table for dictionary searching.

CONTEXT Memory 36, initialized to 1.

Hash code of the context

vocabulary.