9. P24 CPU Architecture
P24
is a “Minimal Instruction Set Computer” design patterned after Mr. Chuck
Moore's MuP21. P24 has a 24-bit CPU
core with dual stack architecture intended to efficiently execute Forth-like
instructions. The processor design is simple to allow implementation within
field programmable gate arrays. P24
employs a RISC-like instruction set with four 6-bit instructions packed into 24
bit words. With 6-bit code
for instructions, it can accommodate 64 machine instructions. Currently only 26 are implemented. The rest are reserved for user to define
their own instructions.
Following
is a list of unique features of P24:
* 24-bit
address and data buses
* 6-bit
RISC-like CPU instructions
* 4-deep
instruction cache
* 17-deep
data stack
* 33-deep
return stack
* Current
implementation runs at 10 MHz in FPGA
9.1 Registers
and Stacks
P24
has the following registers:
A Address
Register, supplying address for memory read and write
I Instruction
Latch, holding instructions to be executed
P Program
Counter, pointing to the next program word in memory
R Top
of Return Stack
S Top
of Data stack
T Accumulator
for ALU
All
registers are 25 bit wide. The most
significant bit in T, T(24) is the carry produced by the 24-bit adder. This carry bit is preserved as data in T
when it is transferred to other registers and to the stacks. The preservation of carry bit greatly
simplifies the logic processing of data, and allows interrupts to be serviced when
the next program word is fetches from the memory, without having to save the
carry bit and restore it on return.
P24
has two stacks:
S_stack Data stack, 17 levels deep
R_stack Return stack, 33 levels deep
The
return stack is used to preserve return addresses on subroutine calls. The data stack is used to pass
parameters among the nested subroutine calls. With these two stacks in the CPU
hardware, P24 is optimized to support the Forth programming language.
The
24-bit P24 CPU sports a small, RISC-like instruction set. Four 6-bit
instructions are packed into one 24-bit word, and are executed consecutively
after a word is fetched from memory. The P24 CPU has a two-stack architecture
that is easily programmed in
The
following diagram shows the architecture of the P24 processor. It shows the registers, the stacks, and
the data paths among them.
Not
shown in the diagram is the connection between T register and the external data
bus. When reading data from memory,
the A register supplies the memory address to the address bus, and data is
latch from the data bus into the T register. When writing data into memory, the
address is supplied by A register, and data is written to the data bus from the
T register.
Figure 1. The
architecture of P24
Data
Bus Address Bus
| ^ ^
| | |
v | |
|-----| |-----| |-----|
| I | | P
| | A
|
|-----| |-----| |-----|
^ ^
| |
v v
|------------------------------------------|-----| |-----| |-----|-------------------------||
|
Return Stack | R
| -------------- | T
| -------------- | S | Data Stack |
|------------------------------------------|-----| |-----| |-----|--------|----------------- |
^ | |
| v
v
|
|------------------|
|
| ALU |
|
|------------------|
|<------|
9.2 Functional
Block Diagram of P24
These
data path diagrams should be read with the CPU24.VHD file.
The
instruction decoding logic simply apply the proper control signals to the
following register loading and multiplexer selecting signals:
Clr Master
reset
Clk Master
clock, 0-40 MHz
t_sel Select
input to T register
tload If
set, load t_in into T register
spop If
set, pop the data stack
spush If
set, push T on the data stack
a_sel Select
input to A register
aload If
set, load a_in inot A register
r_sel Select
input to R register
rload If
set, load r_in into R register
rpop If
set. Pop the return stack
rpush If
set, push R on the return stack
p_sel Select
input to P register
pload If
set, load P_in into P register
m_sel Select
output to Address bus
iload If
set, load instruction from data bus to I register
reset Clear
the machine instruction counter
slot Output
of machine instruction counter to select instruction
The
synchronous program execution unit clocks the slot signal, which selects the
proper 6-bit instructions in the I register to produce the above control
signals. At the rising clock edge,
the selected data are latched into the proper register and stacks. All data signals must stabilize before
the next rising clock edge strikes.
The
architecture is very simple and components are very similar to one another. It should be very easy to do a good
layout, and the routing should not be difficult.
Figure 2. The
block diagrams of P24 components
The
T and Data Stack Data Path
not
t-------| |-----------| |-----------|
s
xor t-----| | | | |
s
and t-----|-- t_in-----| T |-- t--------| s_stack |----s
s
+ t-------| | | spop-----| |
(s+t)/2-----| tload----| | spush----| |
t/2
--------| clk------| | clk------| |
c&t/2-------| clr------| | clr------| |
(s+t)*2-----| |-----------| |-----------|
t*2&a-------|
t*2---------|
s --------|
a ---------|
r-----------|
data
-------|
|
t_sel-------^
The
A Register and A-Mux
|-----------|
t
----------|-- a_in-----| |---a
a+1---------| | |
(s+t)&a/2---| aload----|A |
a*2+c-------| clk------| |
| clr------| |
a_sel-------^ |-----------|
The
Return Stack Data Path
| |-----------| |-----------|
r_out-------|--r_in-----| R |--r--------| r_stack |---r_out
r+1---------| | | rpop-----| |
p-----------| rload----| | rpush----| |
| clk------| | clk------| |
| clr------| | clr------| |
r_sel-------^ |-----------| |-----------|
The
Program Counter Data Path
| |-----------|
interrupt---| | | |
p&i(17.0)---|--p_in-----| P |--p--------|---address
p+1---------| | | a--------|
r-----------| pload----| | |
| clk------| | |
| clr------| | |
p_sel-------^ |-----------| m_sel----^
The
Instruction Latch and Decoder Data Path
|-----------|
| | |
data--------------------| I |--i(23.0)- |---code(5.0)
iload-----| | |
clk-------| | |
clr-------| | |
|-----------| |
|
|-----------| |
| | |
reset-----| sync |-slot(2.0)-^
| |
clk-------| |
clr-------| |
|--------- |
On
power-up, all registers and the stacks are cleared to zero when "clr"
is held high. When "clr"
is lowered to zero, the master clock "clk" will start the CPU from
memory location 0, as the initialized P register is pointing to.
9.3 Input/Output
Signals and System Timing
P24
is very flexible in packaging, depending on the memory configuration. These are the signals normally brought
to I/O pins. In certain
applications, the memory is included on chip and the address bus and data bus
do not have to be brought out.
CLK 1-40
MHz master clock
A0-23 Address
bus to RAM, SRAM and I/O devices
D0-23 Data
bus for RAM, SRAM and I/O devices
CLR Low
system reset (active low)
Vdd 5V
power supply
Vss Ground
WE Write
enable (active low)
INT0-4 External
interrupt inputs
UART_IN RS232
serial input pin
UART_OUT RS232
serial output pin
All
time periods noted in the following timing diagrams are in periods of the master
clock.
Figure 3. Timing
of P24 instruction executions
Master
Clock
|----------| |----------| |----------| |----------| |----------| |----------| |----
| | | | | | | | | | | | |
| |----------| |----------| |----------| |----------| |----------| |----------|
Slot0
Signal
|----------| |----------| |----------|
|
slot0 | slot1 | slot2 | slot3 | slot4 | slot0 | slot1 | slot2 | slot3 | slot4 | slot0 |
| |-------------------------------------------| |-------------------------------------------| |------
fetch
execute execute execute execute fetch execute execute execute execute
fetch
call,
jump, jz, jnc
|----------| |----------|
|
slot0 | slot1 | slot0 |
slot1 | slot2 | slot3 | slot4
| |----------|
|----------------------------------------
fetch execute execute execute execute execute ...
NOP
and RET instructions can be in any of the four slots. When these two instructions are
executed, slot0 will be forced into the next slot, and the next instruction
words will be fetched and then executed.
The
P24 implementation contains a very simple interrupt controller. If an interrupt
is pending on slot0, the program counter is pushed to return stack and the
interrupt vector is placed in the program counter. The interrupt vector is the current
state of INT0-INT4. Once an
interrupt is serviced via execution of slot4, servicing of interrupts is
automatically disabled until the execution of an RET instruction. Immediately
after the RET execution, any pending interrupt (if any) will be serviced.
When
executing a right shift instruction SHR, the sign bit T(23) is preserved. Bits T(23..1) are shifted to the right
by one bit. Bit T(0) is latched
onto the UART_OUT pin, and UART_IN pin is latched into the carry bit
T(24). This very simple mechanism
allows a simple RS232 serial port to be built in P24 core. As the serial port is the only
peripheral device required by eForth, this simple serial port opens a window
for the user to access the resources provided by P24, and supports a powerful
embedded Forth system to control and to program the P24 system.
9.4 P24
Instruction Set
The
P24 instruction set can be best explained using the register and data flow
diagram as shown in Figures 1 and 2.
The T register is the center of the ALU, which takes data from the T and
S registers and routes the results back to the T register. The contents of T can be moved to the A
register, pushed on the data stack S, and pushed on the return stack S.
The
T register connects the data stack and the return stack as a large shift
register. Data can be shifted
towards the return stack by the PUSH instruction, and shifted towards the data
stack by the POP instruction.
Register
A holds a memory address, which is used to read data from memory into the T
register, or write the data in T register to external memory. The address in A can be auto
incremented, so that P24 can conveniently access data arrays in memory.
P
is the program counter and it holds the address of the next instruction to be
fetched from the memory. After an
instruction is fetched, P is auto incremented and ready to read the next
instruction. When a CALL
instruction is executed, the address in P is pushed on the return stack. When a return (RET) instructions is
executed, the previously saved address in R is popped back into P. The execution sequence interrupted by
CALL can now be resumed.
P24
is a microprocessor with 24-bit instructions. Each instruction contains up to 4 6-bit
machine codes. The instruction
fields in a program word can be shown as follows:
Bits: 23 22
21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
| Slot1 |
Slot2 | Slot3 | Slot4 |
There
are 64 possible instructions in a 6-bit field. Half of these are reserved for
user applications. Only the lower
32 instructions are specified in P24. These instructions consist of four
classes:
0 Transfer
Instructions
1 Memory
Access Instructions
2 ALU
Instructions
3 Register
Instructions
JUMP,
CALL, JZ and JNC instructions must appear as Slot1 of a program word, ie. bits
23-18. The last 18 bits 17-0
contain the address inside the current 256K word page. They can access code within the current
page. To reach other pages of
memory, you will have to push a 24-bit address on the return stack and execute
the RET instruction.
The
transfer instructions thus has the following forms:
JUMP aaaaaa aaaaaa aaaaaa
CALL aaaaaa aaaaaa aaaaaa
JZ aaaaaa
aaaaaa aaaaaa
JNC aaaaaa
aaaaaa aaaaaa
The
conditional jump instruction JZ is used to implement the IF, WHILE, and UNTIL
words in Forth in that it does pop the number being tested in T. The conditional jump instruction JNC
causes a jump if the carry bit T(24) is cleared. It is useful in multiple precision math
operations. JNC and JZ does not pop
the T register, so its contents can be tested again.
Table 1. P24 Machine Code
Code Name Function
Transfer
Instructions
00 JUMP Jump
to 18 bit address. Must in Slot1.
01 RET Subroutine
return.
02 JZ Jump
if T is 0. Must in Slot1.
03 JNC Jump
if carry is reset. Must in Slot1.
04 CALL Call
subroutine. Must in Slot1.
05 Reserved
06 Reserved
07 Reserved
Memory
Access Instructions
08 Reserved
09 LDP Push
memory at A to T. Increment A.
0A LDI Push
in-line literal to T.
0B LD Push
memory at A to T.
0C Reserved
0D STP Pop
T to memory at A. Increment A.
0E Reserved
0F ST Pop
T to memory at A.
ALU
Instructions
10 COM Complement
all bits in T.
11 SHL Shift
T left 1 bit.
12 SHR Shift
T right 1 bit.
13 MUL Multiplication
step.
14 XOR Pop
S and Exclusive OR it to T.
15 AND Pop
S and AND it to T.
16 DIV Division
step.
17 ADD Pop
S and add it to T.
Register
Instructions
18 POP Pop
R to push T.
19 LDA Push
A to T.
1A DUP Duplicate
T.
1B OVER S
to T, push original T.
1C PUSH
Pop
T to push R.
1D STA Pop
T to A.
1E NOP Do
nothing.
1F DROP
Pop
T.
Individual
instructions and their functions are listed as follows:
JUMP
(SKIP, ELSE, AGAIN,
REPEAT)
Code: 0
Usage: 000000
aaaaaa aaaaaa aaaaaa
Stack
Effects: none
Carry: no
change
Function:
Jump
to the 18 bit address in the bit field 17-0 in the current 256K word page of
memory. It must be in slot 0 of a
word.
Restriction:
This
instruction allows the program to be redirected to any location within an 256K
word page of memory. It does not
cross page boundaries. To jump to
locations outside of a memory page, one has to push the target address on the
return stack and execute the RET instruction to effect a long jump. This restriction also applies to CALL,
JZ and JNC. See also RET.
Coding
Example:
CODE
50us
2
ldi skip
CODE
100us
1
ldi
then
sta
-138 ldi
begin lda add
-until
drop
ret
SKIP
makes an unconditional jump to THEN, to let 50us sharing the delay loop with
100us.
RET (;)
Code: 1
Usage: 000001
xxxxxx xxxxxx xxxxxx
cccccc
000001 xxxxxx xxxxxx
cccccc
cccccc 000001 xxxxxx
cccccc
cccccc cccccc 000001
Stack
Effects: (
-- ; R: a -- )
Carry: no
change
Function:
Pop
the address of the top of the return stack into the program counter P, thus
resume the execution sequence interrupted by the last CALL instruction. Besides terminating a subroutine, this
instruction may be used to effect a long jump to a location outside of the
current memory page.
This
instruction can be placed in any slot of a word. The instructions before return are
executed. The instructions
following return are ignored.
Coding
Example:
In
the subroutine thread model, RET is used to terminate all code words and colon
words. The Forth word ; simply
compiles a RET to end a Forth word.
JZ (IF,
WHILE, UNTIL)
Code: 2
Usage: 000010
aaaaaa aaaaaa aaaaaa
Stack
Effects: (
n -- )
Carry: no
change
Function:
Conditionally
jump to the 18 bit address in the bit field 17-0 in the current 256K word page
of memory, if the T register contains a 0.
It must be in slot 0 of a word.
The
T register is destroyed and the data stack is popped back to T. This instruction is different from JNC,
which does not pop the data stack and removes T.
Coding
Example:
CODE
?DUP ( w -- w w | 0 )
dup
if dup ret then
ret
JNC (-UNTIL,
-IF, -WHILE)
Code: 3
Usage: 000011
aaaaaa aaaaaa aaaaaa
Stack
Effects: (
n -- n )
Carry: no
change
Function:
Conditionally
jump to the 18 bit address in the bit field 17-0 in the current 256K word page
of memory, if the Carry flag (Bit 24 of T) is reset. It must be in slot 0 of a word.
The
T register and the data stack are preserved. This instruction is different from the
instructions JZ, which pop the data stack and removes T.
Coding
Example:
To
test the negative flag T(23), it is shifted into carry T(24) and tested using
JNC compiled by -IF.
CODE ABS ( n
-- +n )
dup shl
-if drop com 1 ldi add
ret
then
drop ret
CALL
Code: 4
Usage: 000100
aaaaaa aaaaaa aaaaaa
Stack
Effects: (
-- ; R: -- a )
Carry: no
change
Function:
Call
a subroutine whose address is in the bit field 17-0 in the current 256K word
page of memory. It must be in slot
0 of a word
The
address of the next word is pushed on the return stack. When a return instruction in the
subroutine is encountered, this address is popped off the return stack and the
next word is executed to resume the interrupted execution sequence.
Restriction:
This
instruction allows the program to call to any subroutine within the current
256K page of memory. It does not
cross page boundaries.
Coding
Example:
All
Forth words are compiled as subroutine calls. This is the most efficient way to build
lists in Forth.
LDP
Code: 9
Usage: 001001
ccccccc ccccccc ccccccc
ccccccc
001001 ccccccc ccccccc
ccccccc
ccccccc 001001 ccccccc
ccccccc
ccccccc ccccccc 001001
Stack
Effects: (
-- n )
Carry: reset
to 0
Function:
Fetch
the contents of a memory location whose 24-bit address is in the A register and
push that number onto the data stack.
The address in the A register is then incremented to facilitate
accessing the next memory. It is most
useful in reading values from a table in the memory.
This
fetch instruction is different from the @ instruction in Forth, which uses the
address on the top of the data stack.
This
instruction also resets the carry flag (Bit 24) in the T register.
Coding
Example:
Increment
T sta
ldp drop lda
Otherwise, cccccc
cccccc ldi add
000000
000000 000000 000001
costs
6 slots.
LDI
Code: 0A
Usage: 001010
cccccc cccccc cccccc
nnnnnn
nnnnnn nnnnnn nnnnnn
cccccc
001010 cccccc cccccc
nnnnnn
nnnnnn nnnnnn nnnnnn
cccccc
cccccc 001010 cccccc
nnnnnn
nnnnnn nnnnnn nnnnnn
cccccc
cccccc cccccc 001010
nnnnnn
nnnnnn nnnnnn nnnnnn
Stack
Effects: (
-- n )
Carry: reset
to 0
Function:
Fetch
the contents of the next word and push that number onto the data stack. The program counter PC is incremented
passing the next word. This
instruction allows a program to enter numbers onto the data stack for later
use.
This
instruction also resets the carry flag (Bit 24) in the T register.
Coding
Example:
Push
1 2 3 4 on data stack:
Ldi
ldi ldi ldi
1
2
3
4
LD
Code: 0B
Usage: 001011
cccccc cccccc cccccc
cccccc
001011 cccccc cccccc
cccccc
cccccc 001011 cccccc
cccccc
cccccc cccccc 001011
Stack
Effects: (
-- n )
Carry: reset
to 0
Function:
Fetch
the contents of a memory location whose 24-bit address is in the A register and
push that number onto the data stack.
The address in the A register is not modified.
This
fetch instruction is different from the @ instruction in Forth, which uses the
address on the top of the data stack.
This
instruction also resets the carry flag (Bit 24) in the T register.
Coding
Example:
STP
Code: 0D
Usage: 001101
cccccc cccccc cccccc
cccccc
001101 cccccc cccccc
cccccc
cccccc 001101 cccccc
cccccc
cccccc cccccc 001101
Stack
Effects: (
n -- )
Carry: restore
from data stack
Function:
Pop
the number off the data stack and store it into the memory location whose
24-bit address is in Register A.
The address in the A register is then incremented to facilitate the next
memory access. It is most useful in
storing values to a table in the memory.
This
store instruction is different from the ! instruction in Forth, which uses the
address on the top of the data stack.
Coding
Example:
See
the copying program shown in LDP.
ST
Code: 0F
Usage: 001111
cccccc cccccc cccccc
cccccc
001111 cccccc cccccc
cccccc
cccccc 001111 cccccc
cccccc
cccccc cccccc 001111
Stack
Effects: (
n -- )
Carry: restore
from data stack
Function:
Pop
the number off the data stack and store it into the memory location whose
24-bit address is in Register A.
The address in the A register is not modified.
This
store instruction is different from the ! instruction in Forth, which uses the
address on the top of the data stack.
Coding
Example:
CODE
! ( n a -- )
sta st ret
COM
Code: 10
Usage: 010000
cccccc cccccc cccccc
cccccc
010000 cccccc cccccc
cccccc
cccccc 010000 cccccc
cccccc
cccccc cccccc 010000
Stack
Effects: (
n1 – n1* )
Carry: no
change
Function:
Complement
all 24 bits in the T register. This
is a one's complement operation.
Coding
Example:
To
generate a -1 in T register:
zero
com
OR
has to be synthesized from COM, and AND using:
A
or B = not( not(A) and not(B))
CODE OR ( n
n - n ) (
this looks pretty awkward, maybe )
com push com (
the last available opcode or NIP )
pop and com ret (
should be replaced with OR )
SHL
Code: 11
Usage: 010001
cccccc cccccc cccccc
cccccc
010001 cccccc cccccc
cccccc
cccccc 010001 cccccc
cccccc
cccccc cccccc 010001
Stack
Effects: (
n -- 2n )
Carry: Bit
23 of T is shifted into carry
Function:
Shift
all lower 24 bits in the T register to the left by 1 bit. The lowest Bit-0 is cleared.
Coding
Example:
Multiply
T by 3: dup
shl add
Multiply
by 5: dup
shl shl add
Multiply
by 6: dup
shl add shl
SHL
allows the negative bit of T(23) to be tested as carry T(24):
CODE 0< (
n - f )
shl
-if drop -1 ldi ret
then
dup xor ( 0 ldi )
ret
SHR
Code: 12
Usage: 010010
cccccc cccccc cccccc
cccccc
010010 cccccc cccccc
cccccc
cccccc 010010 cccccc
cccccc
cccccc cccccc 010010
Stack
Effects: (
n -- n/2 )
Carry: loaded
from serial input
Function:
Shift
the contents of the T register right by one bit. Bit-0 is shifted to the bit-banged UART
serial output. The sign (Bit23) is preserved.
Coding
Example:
SHR
is used to implement a simple UART.
The lowest bit in T, T(0) is shifted out to the UART serial output pin,
and the UART serial input pin is loaded into
carry
for testing.
CODE EMIT (
c -- )
$7F
ldi and
shl
$FFFF01 ldi xor
$0A
ldi
FOR
shr 100us NEXT
drop
ret
CODE KEY (
-- c )
$FFFFFF ldi
begin shr
-until
repeat ( wait for
start bit )
50us
7
ldi
FOR
100us shr
-if $80 ldi xor then
NEXT
$FF
ldi and
100us ret
MUL
Code: 13
Usage: 010011
cccccc cccccc cccccc
cccccc
010011 cccccc cccccc
cccccc
cccccc 010011 cccccc
cccccc
cccccc cccccc 010011
Stack
Effects: (
n1 n2 -- n1 n3 )
Carry: unchanged
Function:
Conditionally
add the S register on the data stack to the T register if Bit-0 in A is set. If Bit-0 in A is reset, T register is
not modified. The T-A register pair
is now shifted to the right by one bit.
This
MUL instruction is useful as a multiplication step in implementing a fast
software multiplication routine.
Repeating this instruction 24 times will multiply A and S and produce a
48-bit product in the T-A pair. (T is normally initialized to zero prior to the
multiply sequence. However any non-zero initial value in T adds to the final
result in the T-A pair.)
Coding
Example:
Multiply
two 24-bit unsigned integers.
Multiplicand is in S.
Multiplier is in A.
mul
mul mul mul
mul
mul mul mul
mul
mul mul mul
mul
mul mul mul
mul
mul mul mul
mul
mul mul mul
The
48-bit product is in T-A register pair and the multiplicand in S is preserved.
Primitive
multiplication routines are thus defined:
CODE UM* ( u
u -- ud )
sta 0 ldi
mul mul mul mul
mul mul mul mul
mul mul mul mul
mul mul mul mul
mul mul mul mul
mul mul mul mul
push drop lda pop
ret
XOR
Code: 14
Usage: 010100
cccccc cccccc cccccc
cccccc
010100 cccccc cccccc
cccccc
cccccc 010100 cccccc
cccccc
cccccc cccccc 010100
Stack
Effects: (
n1 n2 -- n3 )
Carry: unchanged
Function:
Pop
S on the data stack and exclusive-OR it to the T register. All 24 bits in T are affected.
Coding
Example:
To
clear T to zero:
dup
xor ( now use more transparent
“drop zero” )
To
generate a zero in T register:
dup
dup xor (
now use faster “zero” )
T
is duplicated twice to save its contents.
The two duplicated copies of T are XOR'ed together. All the reset bits remained reset. All set bits get reset. Thus a 0 is created in T.
It
costs 5 slots to produce a -1:
Ldi
cccccc cccccc cccccc
-1
vs
dup
dup xor com ( now use faster “zero
com” )
AND
Code: 15
Usage: 010101
cccccc cccccc cccccc
cccccc
010101 cccccc cccccc
cccccc
cccccc 010101 cccccc
cccccc
cccccc cccccc 010101
Stack
Effects: (
n1 n2 -- n3 )
Carry: unchanged
Function:
Pop
S on the data stack and AND it to the T register. All 24 bits in T are affected.
Coding
Example:
DIV
Code: 16
Usage: 010110
cccccc cccccc cccccc
cccccc
010110 cccccc cccccc
cccccc
cccccc 010110 cccccc
cccccc
cccccc cccccc 010110
Stack
Effects: (
n1 n2 -- n1 n3 )
Carry: unchanged
(I think – need to check.)
Function:
Add
the S register on the data stack to the T register. If the addition produces a
carry place the sum in T, otherwise leave T unchanged. The T-A register pair is now shifted to
the left by one bit. Carry is
shifted into A(0).
This
DIV instruction is useful as a division step in implementing a fast software
division routine. Repeating this
instruction 25 times will divide a 48 bit number originally in the T-A register
pair by the negative of the number in S, leaving the result in A and remainder
in T.
Coding
Example:
Divide
a 48-bit positive integer by a positive divisor. The negated divisor is in S.
div
div div div
div
div div div
div
div div div
div
div div div
div
div div div
div
div div div
div
shr
(Note:
I think that this last shr undoes the most recent shl that is
part
of div, aligning the remainder properly in T. Also I think
this
division actually only works properly for 47 bit unsigned
numbers
in T-A. -- WRC)
Primitive
division routines are thus defined:
CODE UM/MOD
( ud u -- ur uq )
com 1 ldi add sta
push lda push sta
pop pop
skip
CODE /MOD (
n n -- r q )
com 1 ldi add push
sta pop 0 ldi
then
div div div div
div div div div
div div div div
div div div div
div div div div
div div div div
div 1 ldi xor shr
push drop pop lda
ret
ADD
Code: 17
Usage: 010111
cccccc cccccc cccccc
cccccc
010111 cccccc cccccc
cccccc
cccccc 010111 cccccc
cccccc
cccccc cccccc 010111
Stack
Effects: (
n1 n2 -- n1+n2 )
Carry: change
according to n1 and n2
Function:
Pop
S on the data stack and add it to the T register.
Coding
Example:
The
primitive addition in eForth is thus defined:
CODE
UM+ ( n n - n carry ) (
don’t use this if you want speed – WRC )
add
-if 1 ldi ret
then
dup dup xor ( 0 )
ret
POP
Code: 18
Usage: 011000
cccccc cccccc cccccc
cccccc
011000 cccccc cccccc
cccccc
cccccc 011000 cccccc
cccccc
cccccc cccccc 011000
Stack
Effects: (
-- n ; R: n -- )
Carry: unchanged
Function:
Pop
the R register on the return stack to the T register. Original contents in T are pushed on the
data stack.
Coding
Example:
Exchanging
A and T lda
push sta pop
Exchanging
A and R lda
pop sta push
Increment
T sta
ldp drop lda (
now use “one add” )
Decrement
T dup
dup xor com add (
now use “zero com add” )
LDA
Code: 19
Usage: 011001
cccccc cccccc cccccc
cccccc
011001 cccccc cccccc
cccccc
cccccc 011001 cccccc
cccccc
cccccc cccccc 011001
Stack
Effects: (
-- a )
Carry: unchanged
Function:
Copy
the contents in the A register to the T register. The original content of the T register
is pushed on the data stack. With
LDA and STA, the A register can serve as a scratch pad register to save and
restore the contents of the T register.
Coding
Example: (see example for POP)
DUP
Code: 1A
Usage: 011010
cccccc cccccc cccccc
cccccc
011010 cccccc cccccc
cccccc
cccccc 011010 cccccc
cccccc
cccccc cccccc 011010
Stack
Effects: (
n -- n n )
Carry: unchanged
Function:
Duplicate
T register and push it on the data stack.
Coding
Example:
Decrement
T dup
dup xor com add (
now use “zero com add” )
OVER
Code: 1B
Usage: 011011
cccccc cccccc cccccc
cccccc
011011 cccccc cccccc
cccccc
cccccc 011011 cccccc
cccccc
cccccc cccccc 011011
Stack
Effects: (
n1 n2 –- n1 n2 n1 )
Carry: unchanged
Function:
S
is transferred into T register. The
original contents in the T register is pushed onto the data stack.
Coding
Example:
CODE 2DUP (
n1 n2 – n1 n2 n1 n2 )
over over ret
PUSH
Code: 1C
Usage: 011100
cccccc cccccc cccccc
cccccc
011100 cccccc cccccc
cccccc
cccccc 011100 cccccc
cccccc
cccccc cccccc 011100
Stack
Effects: (
n -- ; R: -- n )
Carry: unchanged
Function:
Pop
S on the data stack and store it to the T register. The original contents in the T register
is pushed onto the return stack.
Coding
Example:
CODE ROT (
w1 w2 w3 -- w2 w3 w1 )
push push sta pop
pop lda ret
STA
Code: 1D
Usage: 011101
cccccc cccccc cccccc
cccccc
011101 cccccc cccccc
cccccc
cccccc 011101 cccccc
cccccc
cccccc cccccc 011101
Stack
Effects: (
a -- )
Carry: no
change
Function:
Pop
S on the data stack and store it to the T register. The original contents in the T register
is copied into the A register. This
instruction initializes the A register so that it can be used to fetch data
from memory or store data into memory.
Coding
Example:
CODE ! ( n a
-- )
sta st ret
NOP
Code: 1E
Usage: 011110
xxxxxx xxxxxx xxxxxx
cccccc
011110 xxxxxx xxxxxx
cccccc
cccccc 011110 xxxxxx
cccccc
cccccc cccccc 011110
Stack
Effects: ( -- )
Carry: no
change
Function:
No
operation. This instruction will
force the execute state to slot 0, to get the next word to be fetched and
executed.
Coding
Example: usually inserted by assembler.
DROP
Code: 1F
Usage: 011111
cccccc cccccc cccccc
cccccc
011111 cccccc cccccc
cccccc
cccccc 011111 cccccc
cccccc
cccccc cccccc 011111
Stack
Effects: (
n -- )
Carry: unchanged
Function:
Pop
S on the data stack and store it to the T register. The original contents in the T register
are lost.
Coding
Example: see example for jump.