FAST PARALLEL OPTICAL DIGITAL MULTIPLICATION

Yao LI, George EICHMANN and R.R. ALFANO

Department of Electrical Engineering, and The Institute of Ultrafast Spectroscopy and Lasers, City College of New York, New York, NY 10031, USA

Received 19 January 1987; revised manuscript received 13 May 1987

A parallel optical binary multiplication scheme is proposed in which parallel convolution preprocessing is performed using a parallel-input optical outer-product processor together with a one dimensional either space or time integrator. Using a theta-modulation based optical A/D converter and a carry look-ahead adder array, the resulting mixed-binary partial product can be reduced to the final binary multiplication result.

1. Introduction

One of the fundamental operations in a computer is digital multiplication. The conventional approach in digital multiplication uses a shift and add scheme. To perform the digital multiplication of two \( N \)-bit numbers, after forming partial products, \( N-1 \) parallel adders are used. The additions can be performed in \( \lceil \log_2 N \rceil \) stages, where \( \lceil x \rceil \) denotes the smallest integer that is larger than \( x \). For example, to multiply two 64-bit numbers, 63 adders in 6 parallel stages are required. It is very important to find new algorithms to perform faster multiplications. With the digital multiplication via analog convolution (DMAC) \([1-4]\) algorithm, after performing a digital convolution on the two numbers and converting, using an A/D element, the mixed-binary partial product to its binary form, only \( \lceil \log_2 (N+1) \rceil - 1 \) parallel adders in \( \lceil \log_2 (\log_2 (N+1)) \rceil \) stages are needed. For example, in the \( N=64 \) case, after A/D conversion, only 6 adders in 3 stages are sufficient. For the multiplication of large binary numbers, in principle, the DMAC algorithm offers a faster processing speed. Since optics offers ultrafast processing speed and parallelism, optical DMAC processors have been proposed \([1-4]\). With a conventional optical serial DMAC processor, to perform an optical digital convolution, two acousto-optic (AO) deflectors, actuated by electronic pulse trains representing the binary serial inputs, are used. Because of the serial input format, the convolution of two \( N \)-bit numbers requires \( 2N \) temporal cycles, cycles that are limited by the acoustic wave propagation speed and the AO material response time. For example, the convolution of two 16-bit numbers \([4]\), with the currently available AO cells takes approximately 64 ns. After the convolution, to convert the mixed-binary result to binary number strings, parallel A/D converters are needed. An electro-optic (EO) interferometric A/D converter \([5,6]\) can perform conversions in the order of nanoseconds. However, for each single \( N \)-bit EO converter, \( N \) waveguide interferometers are needed. Furthermore, because the periodic interferometric output is analog, to generate a digital number, an additional electronic comparator array must be used. Because for an \( N \)-bit serial-input digital multiplication, the DMAC algorithm requires an array of \( 2N-3 \) A/D converters resulting in a large number of active EO waveguide elements.

In this communication, a new optical parallel DMAC (P-DMAC) processor is proposed. It consists of a parallel ultrafast data convolver, a fast theta-modulation EO A/D converter and an array of fast carry look-ahead adders. To increase the speed of the digital convolution preprocessing, instead of a serial, a parallel-input scheme is proposed. For the optical A/D conversion, a new theta-modulation (T-M)-based \([7,8]\) EO device is described. This \( N \)-bit converter, that also can have a nano- or even sub-nano-second response, requires only one active nonlinear
element. Therefore, it is more compact and less power-hungry. To add the A/D converter results, parallel carry look-ahead adder arrays are used. Using the proposed P-DMAC processor with the present technology, the overall multiplication speed is limited by the speed of \( \log_2(\log_2(N+1)) \) carry look-ahead add stages. Using the existing optical ultrafast carry generation method [9] together with optical picosecond switching technology, optical fast adders may be constructed. Thus, this system can lead to faster optical binary multiplication operation.

2. A parallel optical digital convolution scheme

To perform fast convolution, rapid logic AND gates together with fast scan and sum operators are needed. It will be shown, that the first two operations can be performed via a parallel input vector outer-product processor [10]. The third, the summation, operation can be implemented via either a space or a time integrating architecture. Next, several vector outer-product-based parallel-input optical convolution devices are described.

First, a geometric optical shadow-casting-based [11,12] parallel-input optical convolution scheme is described. To optically represent the two multipliers, two superposed, spatially encoded (with logic one (zero) as a transparent (opaque) pixel, respectively) masks are utilized. As an example, consider the multiplication of the two decimal numbers \( A=11 \) and \( B=15 \). Their binary equivalents are \( A=1011 \) and \( B=1111 \). In fig. 1 (a and b), the two spatially encoded masks, representing the numbers \( A \) and \( B \), are shown. Here, between every two consecutive bits, an opaque pixel guard bit is used. To generate the two input vector binary outer-product, these two masks are cross-overlapped. The 2-D pixel array shadowgram formed behind the overlapped masks (see fig. 1(c)) represents the two input vector outer-product. To obtain the convolution, using a cylindrical lens aligned with the shadowgram's diagonal direction, the pixel light intensities are summed. The presence of the guard bit between every two consecutive data bits prevents cross-talk between the adjacent data channels. In fig. 1(d), the light intensity pattern slightly off the lens back focal plane is shown. The number of bars in each of the seven channels signifies the mixed-binary product \( C=123221 \). In a practical implementation, a 1-D diode detector array is placed in the lens back focal plane. In that case, instead of counting the number of bars, the focal plane intensity levels represent the mixed-binary number (see fig. 1(e)). It is interesting to note, that for a coherent illumination, this detected signal is the dc component of the two 1-D data cross-ambiguity function [13]. The side lobe of the cross-ambiguity function may be used for error detection.

While with the serial-input convolution scheme the bit-string scanning speed depends upon the speed of the acoustic wave, with the parallel-input convolution both parallel AND and self-scanning operation are performed instantaneously. Also, unlike the serial-input, where to separate the two consecutive numbers a number of idle time slots are used, with the parallel-input convolution method the data can be processed without the need for idle time. It can also be shown that, using orthogonal polarization encoding [14,16], the two parallel digital convolution channels can simultaneously be processed. Here, the
Fig. 2. Three ultrafast parallel-input optical digital convolvers. A 2-D array of (a) etalon and (b) SHG AND gates, together with two cylindrical input lenses, is used to obtain the vector outer-product of two inputs. Using a 45° oriented output cylindrical lens, the convolution result is obtained. (c) An alternative SHG parallel-input optical convolution approach where no additional lenses are required.

previously mentioned guard pixels are used as a second, orthogonal polarization encoded, channel. At the output, using a polarizing beamsplitter, the two parallel channels can be separated. To implement this scheme, either a liquid-crystal EO or a magneto-optic (MO) 2-D spatial light modulator (SLM) is needed. Currently, the processing speed of these devices is limited. To increase the processing speed, next, some possible ultrafast, parallel-input optical digital convolution schemes are proposed.

As noted earlier, to generate a parallel-input vector outer-product, each optical input digit needs to be expanded into a light bar so as to overlap the second optical input (see fig. 1 (a)-(b)). The thus generated patterns can then be directed to a 2-D array of ultrafast optical AND gates. For example, in fig. 2(a) and (b), two parallel-input vector outer-product-based optical digital convolvers are illustrated. Here, either a 2-D nonlinear etalon [17] (as per fig. 2(a)) or a non-collinear second harmonic generator (SHG) AND gate array [18] (as per fig. 2(b)) is illustrated. With an etalon, the switching threshold must be set so that only when both inputs are present an output is generated. While with the SHG, using the nonlinear three wave-mixing effect, the two off-axis inputs yield an on-axis frequency-doubled AND output. In either case, to convert the digital vector outer-product into a convolution, a 45° oriented 1-D space-integrating cylindrical lens is used. It is also possible to consider a time-integrating architecture. In fig. 2(c), as an example, a parallel time-integrating convolver is sketched. Here, to form a 2-D vector outer-product, the two parallel 3-bit inputs are injected into a thick SHG crystal. As the frequency-doubled outputs emerge, they are automatically aligned into five parallel channels. Using five time-integrating detectors, the optical digital convolution result is generated. Using any of these schemes, picosecond, parallel-input optical convolution operations can be realized [18].

3. A theta-modulation-based A/D converter

The key idea for an A/D conversion is the generation of a parallel set of different period periodic functions [5,6]. To achieve this goal, the EO interferometric approach uses a parallel set of active EO modulators. However, to A/D convert a large number, a large number of EO modulators and electronic comparators are required. It will be shown, that using a new T-M A/D converter, instead of using a large number of EO interferometric modulators and comparators, only one active and N parallel passive elements are sufficient.

The active element is a voltage controlled beam deflector that deflects a 1-D input beam to different spatial locations. There are a number of devices available to perform this function. For example, a variable grating mode SLM (VGLM) [19] can generate, using different applied voltages, various spatial frequency gratings that deflect the incident beam to different 1-D locations. The EO beam deflector [20] uses a voltage tunable index-gradient to deflect the incident light. A streak-camera [21], commonly used for ultrafast laser pulse measurement, can also be modified to be a fast beam deflector. Recently, other fast, efficient and high resolution beam deflection devices, such as the EO internal reflection deflector [22], the waveguide modulator deflector [23], etc., have also been reported. Some of these devices, because of their small capacitances (order of pF), can operate at a high (nano- or even sub-nanosecond) speed with a low (order of volts) driving voltage [23]. With these devices, the input
voltages, corresponding to detected intensity levels, are optically mapped to different spatial locations.

To convert the spatially mapped 1-D light distributions to their binary representations, a parallel set of spatially encoded masks, representing a set of different period periodic clipping operations, is used. For example, in fig. 3, for a four bit A/D conversion, four masks are shown. To illuminate the four parallel A/D conversion masks, the deflected optical beams must be focused (expanded) in the vertical (horizontal) direction (see fig. 4.) For a different horizontal-level bar, the light distribution at the mask output side represents its binary number code. Using a second cylindrical lens, the different level binary codes can be shifted to a common horizontal level where a 1-D detector array can be placed. One advantage of this new A/D conversion approach is that only a single active nonlinear element is required. Thus, in comparison with EO interferometric approach, the power consumption is drastically reduced. Also, electronic comparators are not required. Another advantage is, that by simply changing masks other binary output codes, such as grey codes, can be obtained. Thus, this approach yields a more flexible A/D conversion scheme. With SLM generated masks, a programmable multipurpose optical A/D converter can be implemented.

4. The generation of the digital multiplication

Now that the mixed-binary number is converted to a set of binary bit strings, these results must be directed to a fast carry look-ahead adder array. Recently, a new optical carry look-ahead addition algorithm was proposed [9] where using optical multiple reflections, the carries are generated optically with a light propagation speed. With this algorithm, a complete N-bit carry look-ahead addition needs only four operational cycles. Thus, using a set of cascaddable ultrafast parallel optical logic switches [24], the implementation of a sub-nanosecond optical carry look-ahead adder can be expected.

As a numerical example, in fig. 5, the multiplication of two 7-bit numbers, \( A = 1011011 \) and \( B = 1111111 \), is illustrated. The convolution of the two bit strings yields the mixed-binary partial result \( C = 1123345443221 \). Since the maximum weight is
less then or equal to a seven, an array of 3-bit optical A/D converters are required. In the middle part of fig. 5, the converted results are shown. When this result is properly grouped, only two parallel addition stages are required. With these stages, the final multiplication result \( C = 1011010010010 \) will be generated.

In fig. 6, a proposed real-time 4-bit optical digital multiplier is shown. Here, a SHG-based 4-bit parallel-input optical convolver is used to perform an ultrafast, optical digital convolution. At each convolution output channel, the mixed-binary result is separately detected. The detected voltage signals are then used to modulate a beam deflector array. To convert the deflected beams to their binary representations, an array of A/D conversion masks are used. Finally, to generate the digital multiplication result, the partial results are directed to a fast carry look-ahead adder array.

As mentioned earlier, for the multiplication of two \( N \)-bit binary numbers, \( \log_2(N+1) - 1 \) additions in \( \log_2(\log_2(N+1)) \) stages are needed. With the DMAC scheme, the overall multiplication time is \( T_C + T_{A/D} = [\log_2(\log_2(N+1))] T_A \), where the subscripts \( C, A/D \) and \( A \) denote the convolver, the A/D converter, and the adder, respectively. Compared to the conventional multiplication scheme, the time needed for the last, the addition, part is drastically reduced. Compared to the serial-input DMAC scheme, this P-DMAC saves the convolution preprocessing time. Using the proposed parallel vector outer-product-based optical convolver, an EO-based waveguide T-M A/D converter and a fast optical carry look-ahead adder array, the digital multiplication of two 32-bit numbers in the order of nanoseconds should be possible.

5. Summary

In this communication, a parallel-input optical digital multiplication scheme has been described. For the parallel-input digital multiplication preprocessing, various optical vector outer-product processors are utilized. With either a nonlinear etalon- or SHG-based approach, the ultrafast parallel-input digital convolution can also be contemplated. To convert the convolution result from a mixed-binary to a binary form, a new optical T-M A/D converter together with a fast carry-look-ahead adder array is described. The optical T-M A/D converter uses only one active, a fast voltage controlled beam deflector, element and \( N \) passive spatially encoded binary masks. The A/D converted results are then added, using a fast carry look-ahead adder array, to generate the final multiplication result. The major advantages of this optical parallel digital multiplication scheme are (i) in comparison with its serial-input counterpart, the speed of the parallel-input convolver increases, (ii) the use of new T-M A/D converter reduces the number of active nonlinear elements leading to a more compact, less power hungry and a more economical A/D conversion, (iii) as compared to a direct multiplication, using this approach, both the number of adders and their required cascading stages are reduced leading to an overall faster digital multiplication operation. The problem that still exists with this scheme is that since the DMAC processor uses analog signals, very high accuracy optical systems for generating both outer-product and A/D conversion are needed. Recently, it has been indicated that the dynamic range and accuracy play a crucial role in determining the analog processor performance [25]. An optical analog processor is vulnerable to noisy inputs. Using high quality optical elements, uniform input beam illumination...
together with high dynamic range optical detectors, processing accuracy can be enhanced.

Acknowledgement

Constructive comments from the referee are appreciated. This work is supported by a grant from the Air Force Office of Scientific Research #84-0144.

References