CHAPTER 11.

SIMULATOR OF A GAPP PROCESSOR ARRAY

1. THE GAPP CHIP

GAPP (Geometrical Arithmetic Parallel Processor) is a CMOS chip with 72 processors on a single chip. It was invented by Dr. Wlodzimierz Holsztynski, a Polish mathematician, when he was with Martin-Marietta in Florida. The chip is now manufactured by NCR as NCR45CG72. The processors are arranged as a 6x12 array. Each processor is connected to its four nearest neighbors. Processors on the edges of the chip have their connection brought to I/O pins, so that many chips can be connected to form a very large processor array. As the processors in a large array form a 2D array with nearest neighbor connections, it is a very efficient structure to handle two dimensional problems like image processing.

The internal structure of a processor is very simple. It is basically a one bit processor with 4 internal registers and 128 bits of memory. There is a simple ALU which takes the contents of three registers as input and generates a one bit sum, a carry and a borrow in every clock cycle. There are five multiplexers, one in front of each internal register and one in front of the memory. These multiplexers can be programmed to route data among the registers, the ALU, and the memory. It thus belongs to the SIMD (Single Instruction Multiple Datapath) architecture, because the ALU is performing one and the same function every machine cycle. It is a full, one bit adder/subtracter. However, by configuring the registers properly, the ALU can perform all the two bit binary logic functions; therefore, it serves as the basis of a very powerful computing structure. Figure 19 shows the contents of a GAPP processor element.

Instructions to the processor array is used to control the multiplexers to define the datapath in a clock cycle. All processors in the processor array execute the same instruction at any given cycle. The instruction is broadcast to all processors and it is executed synchronously. An instruction consists of 20 bits, 13 of them are used to control the multiplexers, and 7 bits are used to select a memory plane to read or write. To run this processor array, a specialized controller is required to generate sequences of the 20 bit instruction patterns at the clock rate of the array.

It is difficult to convince people that this type of simple computing device is even useful, not to mention the possibility to compete against the modern powerful processors. Dr. Holsztynski made the following interesting observation to illustrate the power of SIMD device. In a conventional processor, the CPU must perform a host of different functions. Each function requires resources in both silicon area and in machine cycles. Only one instruction can be executed at any time, consequently the other logic devices are idle. A processor with 100 instructions shows an instruction efficiency of only 1%. The GAPP processor, on the other hand, has an ALU which is active always and thus shows the highest silicon efficiency. The problem is how to program the GAPP array so that the ALU will perform useful work all the time.

2. GAPP SIMULATOR

As the GAPP array requires a high speed, programmable controller to supply instruction streams, not much one can do at the beginning of the GAPP project. To demonstrate realistically the usefulness of GAPP array to solve practical problems, a good simulator would be of great help. Here an old image processor made by DeAnza/Gould was available to us for this simulation. Because GAPP array consists of a large number of processors connected as a planer matrix, this structure maps very well to an image processor which stores and processes two dimensional images. The DeAnza/Gould IP5500 has one megabytes of memory, which can be thought of as 32 planes of 512x512 bits stacked together. Physically these bitplanes are grouped into 4 channels, with 8 bitplanes to a channel. Among the 32 bitplanes, we have to use the top 12 planes to simulator the GAPP registers and ALU. Only 20 bitplanes are available to simulate GAPP memory planes. For most of the simulation work, only the lower 16 bitplanes are simulated using channels 0 and 1 in the image processor.

The GAPP chip has four registers: CM, EW, NS, and C. The CM register handles communication in the north-south direction. The EW and NS registers are used for nearest neighbor data transfer, and the C register is used to store carry or borrow from the ALU. The ALU receives inputs from the EW, NS, and C registers, and generates the sum, carry, and borrow as the results of the three bit addition/subtraction. The logic function of this ALU is best described by the following truth table in Figure 20.

This set of input/output relationship can be best simulated in the image processor by table look-up technique. This truth table is used to generate an ITT (Intensity Transformation Table) in the image processor. When the image memory simulating the EW, NS, and C registers are read through this ITT, the plus, carry, and borrow bits are automatically generated and then stored back to the appropriate memory planes. Thus in a single TV frame time, we can produce the ALU results of all 512x512 simulated GAPP processors. In the next TV frame, the ALU results as well as other data can be routed back to the proper destination registers or memory through another table look-up operation.

Although GAPP is basically a one bit full adder/subtracter, by fetching and storing data in consecutive memory and use the carry or borrow stored in the C register, it is very straightforward to implement multiple bit integer arithmetics. A few examples will be show later. The more interesting property of the ALU is that by setting or clearing one or more registers among EW, NS, and C, all the binary logic function can be performed by this ALU, making it is truly general purpose computing device for large arrays of digital information. The conditions to perform logic operations are shown in Figure 21.

3. GAPP SIMULATOR IN FORTH

The DeAnza/Gould image processor IP5500 is controlled by a LSI-11 microprocessor which runs a very early version of LSI-11 polyForth from Forth, Inc. The image processing software system was described in some detail in my 'Forth Notebook,' pp. 78-113. Since the GAPP simulator uses only the image transfer operation through intensity transformation table, very little knowledge about the image processor is required to understand the GAPP simulator.

The source code and shadow comments are shown in Listing 12. Screens 1 through 10 are the source code of the simulator proper. Screens 11 to 15 are examples of elementary GAPP functions, such as multiple bit addition, subtraction, absolute values, and image dilations and erosions. Screen 21 to 25 show the game of life implemented in this GAPP simulator. As most source code have fairly detailed comments in the corresponding shadow screens, we shall only discussed some global features of this simulator.

The syntax of a GAPP instruction is as follows:

dest1: dest2: ... =src1 dest3: ... =src2 ... _$_

One register can serve as the source for many registers. A destination register can only receive data from one source. Source registers are prefixed with an equal sign, and destination registers are appended with a colon. In any one GAPP instruction, only one memory plane can be referred to for either reading or writing, but not both. Memory plane used for source is coded as n =P, and for destination n P: . The bitplane number n must be put on the stack before the memory code.

_$_ terminates a GAPP instruction. In a real GAPP machine, all the code before this terminator up to the last _$_ code are executed in a single cycle. In this simulator, the destination code like =EW actually performs the date transfer, because the image processor cannot process input from many different registers at the same TV frame.

In real GAPP, the 7 bit field specifying the GAPP memory plane is an integral part of the GAPP instruction. In the simulator, we take advantage of the Forth system to pass the address as a parameter on the data stack. This short-cut greatly simplifies the structure of the simulator, but is not quite realistic. Nevertheless, in a real GAPP system, the address generation has to be handled by rather complicated logic circuits, which cannot be simulated easily.

Figure 20. Truth Table of GAPP ALU

Input Output

NS EW C plus carry borrow

0 0 0 0 0 0

0 1 0 1 0 1

1 0 0 1 0 0

1 1 0 0 1 0

0 0 1 1 0 1

0 1 1 0 1 1

1 0 1 0 1 0

1 1 1 1 1 1

Figure 21. Logic Operations of GAPP ALU

Logical Operation Description Condition

INV SM=/NS EW=0, C=1

SM=/EW NS=0, C=1

SM=/C NS=0, EW=1

AND CY=NS*EW C=0

CY=EW*C NS=0

CY=NS*C EW=0

BW=/NS*EW C=0

OR CY=NS+EW C=1

BW=/NS+EW C=1

BW=EW+C NS=0

XOR SM=NSxC EW=0

SM=NSxEW C=0

SM=EWxC NS=0

XNOR SM=/(NSxEW) C=1

Note: SM: plus, CY: carry, BW: borrow, /: negate, x: exclusive or .

Figure 19. Schematic diagram of a GAPP processor unit .

Listing 12. GAPP simulator using IP5500 image processor