CHAPTER 11.
SIMULATOR OF A GAPP PROCESSOR ARRAY
1.
THE GAPP CHIP
GAPP (Geometrical Arithmetic Parallel Processor) is a
CMOS chip with 72 processors on a single chip. It was invented by Dr. Wlodzimierz
Holsztynski, a Polish mathematician, when he was with Martin-Marietta in
Florida. The chip is now
manufactured by NCR as NCR45CG72.
The processors are arranged as a 6x12 array. Each processor is connected to its four
nearest neighbors. Processors on
the edges of the chip have their connection brought to I/O pins, so that many
chips can be connected to form a very large processor array. As the processors in a large array form
a 2D array with nearest neighbor connections, it is a very efficient structure
to handle two dimensional problems like image processing.
The internal structure of a processor is very
simple. It is basically a one bit
processor with 4 internal registers and 128 bits of memory. There is a simple ALU which takes the
contents of three registers as input and generates a one bit sum, a carry and a
borrow in every clock cycle. There
are five multiplexers, one in front of each internal register and one in front
of the memory. These multiplexers
can be programmed to route data among the registers, the ALU, and the memory. It thus belongs to the SIMD (Single
Instruction Multiple Datapath) architecture, because the ALU is performing one
and the same function every machine cycle.
It is a full, one bit adder/subtracter. However, by configuring the
registers properly, the ALU can perform all the two bit binary logic functions;
therefore, it serves as the basis of a very powerful computing structure.
Figure 19 shows the contents of a GAPP processor element.
Instructions to the processor array is used to
control the multiplexers to define the datapath in a clock cycle. All processors in the processor array
execute the same instruction at any given cycle. The instruction is broadcast to all
processors and it is executed synchronously. An instruction consists of 20 bits, 13
of them are used to control the multiplexers, and 7 bits are used to select a
memory plane to read or write. To
run this processor array, a specialized controller is required to generate
sequences of the 20 bit instruction patterns at the clock rate of the array.
It is difficult to convince people that this type of
simple computing device is even useful, not to mention the possibility to
compete against the modern powerful processors. Dr. Holsztynski made the following interesting
observation to illustrate the power of SIMD device. In a conventional processor, the CPU
must perform a host of different functions. Each function requires resources in
both silicon area and in machine cycles.
Only one instruction can be executed at any time, consequently the other
logic devices are idle. A processor
with 100 instructions shows an instruction efficiency of only 1%. The GAPP processor, on the other hand,
has an ALU which is active always and thus shows the highest silicon
efficiency. The problem is how to
program the GAPP array so that the ALU will perform useful work all the time.
2. GAPP
SIMULATOR
As the GAPP array requires a high speed, programmable
controller to supply instruction streams, not much one can do at the beginning
of the GAPP project. To demonstrate
realistically the usefulness of GAPP array to solve practical problems, a good
simulator would be of great help.
Here an old image processor made by DeAnza/Gould was available to us for
this simulation. Because GAPP array consists of a large number of processors
connected as a planer matrix, this structure maps very well to an image
processor which stores and processes two dimensional images. The DeAnza/Gould IP5500 has one
megabytes of memory, which can be thought of as 32 planes of 512x512 bits
stacked together. Physically these
bitplanes are grouped into 4 channels, with 8 bitplanes to a channel. Among the 32 bitplanes, we have to use
the top 12 planes to simulator the GAPP registers and ALU. Only 20 bitplanes are available to simulate
GAPP memory planes. For most of the
simulation work, only the lower 16 bitplanes are simulated using channels 0 and
1 in the image processor.
The GAPP chip has four registers: CM, EW, NS, and C. The CM register handles communication in
the north-south direction. The EW and NS registers are used for nearest
neighbor data transfer, and the C register is used to store carry or borrow
from the ALU. The ALU receives
inputs from the EW, NS, and C registers, and generates the sum, carry, and
borrow as the results of the three bit addition/subtraction. The logic function of this ALU is best
described by the following truth table in Figure 20.
This set of input/output relationship can be best
simulated in the image processor by table look-up technique. This truth table is used to generate an
ITT (Intensity Transformation Table) in the image processor. When the image memory simulating the EW,
NS, and C registers are read through this ITT, the plus, carry, and borrow bits
are automatically generated and then stored back to the appropriate memory
planes. Thus in a single TV frame
time, we can produce the ALU results of all 512x512 simulated GAPP
processors. In the next TV frame,
the ALU results as well as other data can be routed back to the proper
destination registers or memory through another table look-up operation.
Although GAPP is basically a one bit full
adder/subtracter, by fetching and storing data in consecutive memory and use
the carry or borrow stored in the C register, it is very straightforward to
implement multiple bit integer arithmetics. A few examples will be show
later. The more interesting
property of the ALU is that by setting or clearing one or more registers among
EW, NS, and C, all the binary logic function can be performed by this ALU,
making it is truly general purpose computing device for large arrays of digital
information. The conditions to
perform logic operations are shown in Figure 21.
3. GAPP
SIMULATOR IN FORTH
The DeAnza/Gould image processor IP5500 is controlled
by a LSI-11 microprocessor which runs a very early version of LSI-11 polyForth
from Forth, Inc. The image
processing software system was described in some detail in my 'Forth Notebook,'
pp. 78-113. Since the GAPP
simulator uses only the image transfer operation through intensity
transformation table, very little knowledge about the image processor is
required to understand the GAPP simulator.
The source code and shadow comments are shown in
Listing 12. Screens 1 through 10 are the source code of the simulator
proper. Screens 11 to 15 are
examples of elementary GAPP functions, such as multiple bit addition,
subtraction, absolute values, and image dilations and erosions. Screen 21 to 25 show the game of life
implemented in this GAPP simulator.
As most source code have fairly detailed comments in the corresponding
shadow screens, we shall only discussed some global features of this simulator.
The syntax of a GAPP instruction is as follows:
dest1: dest2: ... =src1 dest3: ... =src2 ... _$_
One register can serve as the source for many
registers. A destination register
can only receive data from one source. Source registers are prefixed with an
equal sign, and destination registers are appended with a colon. In any one GAPP instruction, only one
memory plane can be referred to for either reading or writing, but not
both. Memory plane used for source
is coded as n =P, and for destination n P:
. The bitplane number n must
be put on the stack before the memory code.
_$_ terminates a GAPP instruction. In a real GAPP machine, all the code
before this terminator up to the last _$_ code are executed in a single
cycle. In this simulator, the destination
code like =EW actually performs the date transfer, because the image processor cannot
process input from many different registers at the same TV frame.
In real GAPP, the 7 bit field specifying the GAPP
memory plane is an integral part of the GAPP instruction. In the simulator, we take advantage of
the Forth system to pass the address as a parameter on the data stack. This short-cut greatly simplifies the
structure of the simulator, but is not quite realistic. Nevertheless, in a real
GAPP system, the address generation has to be handled by rather complicated
logic circuits, which cannot be simulated easily.
Figure 20.
Truth Table of GAPP ALU
Input
Output
NS
EW C plus carry borrow
0 0 0 0 0 0
0 1 0 1 0 1
1 0 0 1 0 0
1 1 0 0 1 0
0 0 1 1 0 1
0 1 1 0 1 1
1 0 1 0 1 0
1 1 1 1 1 1
Figure 21. Logic
Operations of GAPP ALU
Logical Operation Description Condition
INV
SM=/NS
EW=0, C=1
SM=/EW
NS=0, C=1
SM=/C
NS=0,
EW=1
AND
CY=NS*EW C=0
CY=EW*C NS=0
CY=NS*C
EW=0
BW=/NS*EW C=0
OR
CY=NS+EW C=1
BW=/NS+EW C=1
BW=EW+C NS=0
XOR
SM=NSxC
EW=0
SM=NSxEW C=0
SM=EWxC NS=0
XNOR
SM=/(NSxEW) C=1
Note: SM: plus, CY: carry, BW: borrow, /:
negate, x: exclusive or .
Figure 19.
Schematic diagram of a GAPP processor unit .
Listing 12. GAPP simulator using IP5500 image
processor