# Wideband Digital Correlator Development for AMiBA

Chao-Te Li\*, Homin Jiang, Howard Liu, Derek Kubo, Kim Guzzino and Ming-Tang Chen

*Abstract*—A wideband digital correlator using high - speed analog-to-digital converters (ADCs) and field-programmable gate arrays (FPGAs) was developed for the Array for Microwave Background Anisotropy (AMiBA). The objective is to detect spectral lines for intensity mapping in cosmology. To make use of the 16-GHz intermediate frequency (IF) bandwidth of AMiBA, a 5 giga sample per second (Gsps) ADC was developed. The IF signals were downconverted, digitized, and then streamed through FPGA-based processing units for correlation. A prototype onebaseline correlator using 5-Gsps ADC modules has been deployed. The correlator for a seven-element system is currently under development.

*Index Terms*—AMiBA, correlator, digital signal processing (DSP), FPGA, intensity mapping

#### I. INTRODUCTION

The Array for Microwave Background Anisotropy (AMiBA) [1] detects finite differences in the cosmic microwave background (CMB) - the thermal remnant of the Big Bang. AMiBA receivers observe in the W-band, from 86 to 102 GHz, where the foreground contamination, such as dust emission, free-free or synchrotron radiations are minimized [2], as shown in Fig. 1. The CMB radiation was emitted from the surface of last scattering approximately 380000 years after the Big Bang when the universe cooled sufficiently so that protons and electrons could combine to form neutral hydrogen (HI). Because of the expansion of the universe, the temperature of the CMB has decreased considerably to only 2.73 K, thus primarily radiating within the microwave frequency range. The measurement results have indicated that the CMB is quite uniform and isotropic. There were variations of approximately 100 µK in the temperature that revealed density fluctuations in the early universe. Such fluctuations initiated the growth of structures in the universe.

As an interferometric array, AMiBA is equipped with a correlator for measuring the variations in the CMB on different angular scales. Because the frequency spectrum of the CMB resembles a thermal or blackbody radiation, a wideband analog correlator was adopted for processing the entire IF band between 2 and 18 GHz to achieve high sensitivity [3]. As an initiative is taken for AMiBA to capture the CO intensity fluctuations at high redshifts for probing the Epoch of Re-

ionization (EoR), when the protostars were formed to re-ionize the HI in the early universe [4], a digital correlator is needed to obtain finer spectral resolutions for molecular line observations. Assuming CO is abundant in galaxies, the observations of CO lines can trace the protostars and galaxies. The results would be complementary to the HI observations. However, AMiBA receivers may have to observe at lower frequencies because the spectral lines have been red shifted.



Fig. 1. CMB anisotropy versus foreground emission [2].

#### II. FPGA/ROACH/DESIGN TOOL

Field-programmable gate arrays (FPGAs) were adopted for digital signal processing (DSP) because current FPGAs have adequate speed and capacity. FPGAs are suitable during development stages, compared with application specific integrated circuits (ASICs). High non-recurring engineering charges can be waived because of the re-programmable feature of FPGAs. There are a variety of design tools available, like Xilinx Integrated Software Environment (ISE), System Generator, and some intellectual property (IP) cores.

The FPGA platform used is the Re-configurable Open Architecture Computing Hardware (ROACH), developed by the Center for Astronomical Signal Processing and Electronics Research (CASPER) [5] [6] at the University of California, Berkeley and the SKA group in South Africa. The primary goal of CASPER is to facilitate radio astronomical signal processing instrumentation by developing platform-independent, opensource hardware and software for design reuse. CASPER

C.-T. Li is with the Institute of Astronomy and Astrophysics, Academia Sinica, Taipei, 10617 Taiwan (e-mail: ctli@asiaa.sinica.edu.tw).

provides an integrated design environment - the Matlab/Simulink/System Generator/EDK (MSSGE) tool flow and some parameterized design libraries. Matlab/Simulink provides a graphical environment for design and simulation, and for implementation when combined with Xilinx System Generator and Embedded System Development Kit (EDK).

ROACH is a stand-alone computing platform equipped with Xilinx FPGAs and other related electronics. The main feature of ROACH is a Xilinx Virtex FPGA. A separate Power PC (PPC) running Linux is used to control the board; for example, to program the FPGA and allow access to the FPGA registers and block RAMs (BRAMs). Quad data rate (QDR) SRAMs provide high-speed, medium-capacity memory, specifically for performing corner turns, and one DDR2 DIMM provides lowspeed, high-capacity memory for the FPGA. The PPC also has an independent DDR2 DIMM to boot Linux. The Z DOK connectors allow analog-to-digital converter (ADC) modules to be attached to the FPGA for data acquisition. For further signal processing, 10-GbE (CX4 or SFP+) connectors provide highspeed data links to other ROACHs or commodity CPUs and GPUs.



Fig. 2. ROACH with two 5-Gsps ADCs connected.

## A. EDK

Because ROACH has a separate PPC processor mounted for managing FPGA tasks and peripherals, EDK is used for designing an embedded processor system. The processor accesses system resources through on - chip peripheral bus (OPB) [7].

In EDK, Xilinx Platform Studio (XPS) is the development environment used for designing the hardware portion of an embedded processor system. The microprocessor, peripherals, and interconnection of these components, along with their respective detailed configuration, are specified in XPS. XPS includes a top-level project design file, system.xmp, and maintains the hardware platform description in the Microprocessor Hardware Specifications file, system.mhs. The system.mhs file contains a text representation of the entire embedded system, including information on the processor, bus interface, peripherals, connectivity, and address space. The XPS directory contains subdirectories for the project, such as *data* containing the user constraints file (UCF); *etc*, containing files that capture the options used to run various tools, such as fast\_runtime.opt; and *pcores* including custom hardware peripherals. XPS synthesizes the MHS source file into netlists (with an .ngc extension) by using Xilinx Synthesis Technology (XST) for the following FPGA place and route process [8].

#### B. Peripheral Core

The Matlab/Simulink interface for an ADC board is an intellectual property (IP) core or peripheral core (pcore) in the EDK. There are few components needed as follows [9]:

- <*ip\_name>* is the directory for the pcore, containing at least three subdirectories, *data*, *hdl*, and *netlist*.
   <*ip\_name>* should be placed in the directory *pcores* of XPS.
- @xps\_<ip\_name> is the directory containing all the .m files, and is related to the MSSGE design environment.
  @xps\_<ip\_name> should be put under the xps\_library directory.
- <ip\_name>\_mask.m is the mask file for the interface block in the MSSGE design environment, and should also be put under the *xps\_library* directory.

It is necessary to edit system.mhs for instantiating the peripheral core.

The *pcores*/<*ip\_name*>/*data* directory is composed of three files: .MPD, .BBD, and .PAO. These files are required by the platform generator (PlatGen) in XPS. If IPs are integrated as source codes, the source codes are placed in the *pcores*/<*ip\_name*>/*hdl* directory. PlatGen requires .MPD and .PAO files from the *pcores*/<*ip\_name*>/*data* directory. If IPs are integrated as netlists, multiple netlists for IPs are placed in the *pcores*/<*ip\_name*>/*netlist* directory.

Every IP or pcore in the processing system should have an associated <ip\_name>.mpd file. The Microprocessor Peripheral Description (MPD) file consists of I/O ports, parameters, bus interface, and various options used for component and system configuration. It also contains parameters that will be passed as generics to the top - level peripheral file. The MPD file parameters are set to the default values to be used. If a parameter entry is made in the MHS file, its presence and new value would override the value in the MPD file. Parameters considered default values should not be included in the MHS file.

The Peripheral Analyze Order (PAO) file is used by PlatGen for determining libraries, source HDL files, and the correct compilation order for the XST process.

#### III. ADC

In a communication system, the amount of data transferred increases with the bandwidth. Similarly, in radio astronomy, broader bandwidths provide more samples instantaneously, resulting in an improved signal - to - noise ratio (S/N) and sensitivity. However wideband analog-to-digital converters (ADCs) are required to sample continuous signals to become discrete samples. Increasing the signal processing bandwidth is also necessary.

Currently, digital signal processors run with a system clock on the order of a few hundred MHz, whereas the intermediate frequency (IF) bandwidths that need be processed are on the order of few GHz. A processing task is typically performed at the lowest rate commensurate with the signal bandwidth to satisfy the Nyquist criterion. A common practice involves reducing the bandwidth of a signal through filtering and then reducing the sample rate to match the reduced bandwidth. A processing technique is used for interchanging the order of filtering and sample - rate changes so that the processing proceeds at the reduced output sample rate rather than at the high input sample rate. The finite impulse response (FIR) filter or poly-phase filter-band (PFB) described in the following section will illustrate this technique. Under these circumstances, a multi-rate signal processing technique [10], such as demultiplexing or multiplexing, is adopted for handling the sample rate changes along the signal flow. Demultiplexing can be considered as serial-to-parallel conversion, similar to down sampling.

#### A. 5 Gsps ADC

Fig. 3 shows the block diagram of the 5 giga sample per second (Gsps) ADC [11], [12] constituting four 8-bit ADC cores and clocked by the external clock signal. Each ADC core can handle a maximum sampling rate of 1.25 Gsps. To maximize the bandwidth, the quad ADC is operated in the onechannel mode in which all four ADC cores are interleaved. The four ADC cores sample in four phases  $(0^\circ, 90^\circ, 180^\circ, and 270^\circ)$ of the reduced ADC clock, as shown in Fig. 4. The clock circuit receives an external 2.5 GHz clock (maximum frequency), and the external clock signal is divided by two to generate the internal sampling clock. In the one-channel mode, the in-phase 1.25 GHz clock is sent to ADC core A, while the inverted clock is sent to ADC core B; the in-phase and inverted clocks are delayed by 90° each to generate the clocks for ADC cores C and D, respectively, resulting in an interleaved mode with an equivalent sampling rate of 5 Gsps. Several adjustments for the sampling delay and phase are included in the clock circuit to ensure a proper phase relationship between the different clocks generated internally from the external 2.5-GHz clock. Four samples are output simultaneously, and the effective sampling rate is twice the ADC clock, resulting in a bandwidth equal to the ADC clock.

The settings of the ADC, such as channel selection, offset, gain, and phase adjustment, can be controlled and programmed through a serial peripheral interface (SPI). If the four ADC cores were not appropriately tuned, a spurious signal would be generated at the internal sampling clock (or half of the ADC external clock), as shown in Fig. 5.

### B. ADC Interface

For the 5 Gsps quad (four core) ADC in the one channel mode, each ADC core outputs at 1.25 Gsps, double data rate (DDR). Since there are data during the rising and falling edges of the clock, the DDR clock rate is 1250 MHz/2 = 625 MHz. The ADCs are connected to Virtex-6 FPGAs through serial low-voltage differential signaling (LVDS) interfaces [13]. The maximum speed of the LVDS I/O is around 600 MHz [14], set

by the maximum possible speed of the clock toggling the flipflops in the FPGA logic or the components of the input serializer/deserializer (ISERDES).



Fig. 3. Block diagram of the 5-Gsps ADC.



Fig. 5. The spurious signal due to ADC core mismatch can be seen in the middle of the spectrum.

The Xilinx ISERDES [15] is a dedicated de-serializer or serial-to-parallel converter that can be used to de-multiplex the data stream further to achieve a slower data rate for signal processing. The ISERDES enables high-speed data transfer without requiring the FPGA fabric to match the input data frequency. This converter supports both single-data rate (SDR) and DDR modes. In the SDR mode, the serial-to-parallel converter creates a 2-, 3-, 4-, 5-, 6-, 7-, or 8-bit-wide parallel word. In the DDR mode, the serial-to-parallel converter creates a 4-, 6-, 8-, or 10-bit-wide parallel word. For each core of the 5-Gsps quad ADC, the ISERDES is set in the DDR mode, demultiplex (demux) of 2, for creating 4-bit-wide parallel

outputs, resulting in a total of 16 parallel bit streams for each ADC. When demuxed by 2 with ISERDES, the system clock rate is 312.5 MHz, which is a timing constraint too fast for a large design to meet. Therefore, further demux is required. With an additional demux by a factor of 2, the system clock rate becomes 156.25 MHz, a modest timing for a larger design to meet. The ISERDES outputs are captured by two flip-flop (FF) components. One FF captures data on the rising edge of the clock, and the other captures data on the falling edge. There are 32-bit-wide parallel outputs from the ADC interface block.



Fig. 6. Block diagram of ISERDES. The output ports Q1 to Q6 are the registered outputs of ISERDES\_NODELAY module. The high-speed clock CLK is used to clock in the input serial data stream. The divided clock (CLKDIV) is typically a divided version of CLK (depending on the width of the implemented deserialization). The CLKDIV drives the output of the serial-to-parallel converter, the Bitslip module, and the CE module. The serial input data port (D) is the serial (high - speed) data input port of ISERDES\_NODELAY.



Fig. 7. Configuration of ISERDES\_NODELAY adopted in the 5-Gsps ADC interface. The DATA\_RATE attribute defines whether the incoming data stream is processed as SDR or DDR. The DATA\_WIDTH attribute defines the parallel data output width of the serial-to-parallel converter.

#### IV. CORRELATOR

The two alternative correlator architectures (FX and lag correlator) adopt different routes from receiver sample streams to baseline cross-power spectra, representing opposite sides of the convolution theorem of the Fourier transform, namely, the transform of the convolution (or correlation) equals the products of the transforms [16]. The FX architecture minimizes the number of operations by exploiting the efficient fast Fourier transform (FFT) algorithm. The FX correlator requires significantly fewer operations when either the number of stations  $n_s$  or the number of samples transformed or lags formed  $n_t$  is large. Hence, a real-time digital FX correlator architecture employing FPGA processors is adopted in AMiBA digital correlator developments.

As a discrete Fourier transform captures signals in the time domain and converts them to those in the frequency domain, it suffers from two major drawbacks: leakage and scalloping loss [17]. Leakage can be seen as an input tone appears in more than one frequency bin. Scalloping loss is due to the non-flat frequency response of each bin. A poly-phase filter-band (PFB) is adopted for alleviating these problems because it can produce a flat frequency channel response and excellent suppression of out-of-band signals, as shown in Fig. 8. A PFB de-multiplexes the signal into parallel streams, and then applies FIR filtering and the FFT, as illustrated in Fig. 9. It can be regarded as applying a window function (usually the sinc function – the Fourier transform of a rectangular function) on the time domain data before the FFT. Multiplication and accumulation are performed afterwards to complete the correlation.



Fig. 8. Comparison of the single-bin frequency response of a PFB with that of the FFT.



Fig. 9. A polyphase filter bank is realized with N of 3-tap FIR sub-filters. The input signal is de-multiplexed to go through those filters. The output y(m) is then input to an N-point FFT.

#### A. Prototype Correlator

The prototype correlator is a pocket correlator [18] using two 5 Gsps ADC boards on a ROACH. It channelizes wideband signals from two antennas, and outputs the correlation results. The prototype correlator has the following design parameters: 1.6 GHz bandwidth, 1024 frequency channels, and a time resolution or spectral dump rate of once per 0.226 s. An IF downconverter plate is developed to provide frequency translation of 3.6 to 5.6 GHz IF to baseband for digitization, specifically to observe the 87.6 to 89.6 GHz portion of the sky for detecting HCN and HCO+ in Orion. The prototype correlator was deployed on site, as shown in Fig. 10. The HCO+ and HCN lines were detected from the observations, as shown in Fig. 11.



Fig. 10. Prototype digital correlator deployed on the AMiBA site



Fig. 11. Orion-KL observations using the prototype digital correlator

#### REFERENCES

[1] P. T. P. Ho, et al., "The Yuan-Tseh Lee Array for Microwave Background Anisotropy," ApJ, Vol. 694, pp. 1610-1618 (2009).

- [2] C. L. Bennet, et al. "First-Year Wilkinson Microwave Anisotropy Probe (WMAP) Observations: Foreground Emission," ApJS, Vol. 148, pp. 97-117 (2003).
- [3] C. T. Li, et al., "AMiBA Wideband Analog Correlator," ApJ Vol. 716, pp. 746-757 (2010).
- [4] C. L. Carilli, "Intensity mapping of molecular gas during cosmic reionization," ApJL, Vol. 730 (2011)
- [5] CASPER, "CASPER Collaboration for Astronomy Signal Processing and Electronics Research." [Online]. Available: https://casper.berkeley.edu/
- [6] A. Parsons, et al., "A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization," PASP, Vol. 120, pp. 1207-1221 (2008)
- [7] Xilinx, "CoreConnect Architecture On-chip Peripheral Bus (OPB)."
  [Online]. Available: http://www.xilinx.com/ipcenter/processor\_central/coreconnect/coreconn
- ect\_opb.htm [8] Xilinx, "Embedded System Tools Reference Manual," EDK (v 13.2),
- UG111 July 6, 2011
- CASPER, "How to make a yellow block," [Online]. Available: https://casper.berkeley.edu/wiki/How\_to\_make\_a\_%22yellow%22\_bloc k
- [10] F. J. Harris, Multirate Signal Processing for Communication Systems, Upper Saddle River, NJ: Prentice Hall PTR, 2004
- [11] e2v semiconductors, "EV8AQ160 Quad ADC," EV8AQ160 datasheet, 2010
- [12] H. Jiang, et. al., "A 5 Giga samples per second 8-Bit Analog to Digital Printed Circuit Board for Radio Astronomy", PASP, vol. 126, no. 942, Aug., pp. 761-768, 2014
- [13] Xilinx, Application note, XAPP1071, "Connecting Virtex-6 FPGAs to ADCs with Serial LVDS interfaces and DACs with Parallel LVDS interfaces", 2010
- [14] Xilinx, Data sheet, DS152, "Virtex-6 FPGA DC and Switching Characteristics", 2014
- [15] Xilinx, User Guide, UG361, "Virtex-6 FPGA SelectIO Resources", 2014
- [16] J. D. Romney, "Cross Correlators," in Synthesis Imaging in Radio Astronomy II, G. B. Taylor, C. L. Carilli, and R. A. Perley, Eds. ASP Conference Series, Vol. 180, 1999, pp. 57-78
- [17] CASPER, "The Polyphase Filter Bank Technique," [Online].available: https://casper.berkeley.edu/wiki/The\_Polyphase\_Filter\_Bank\_Technique
- [18] CASPER, "Tutorial Wideband Pocket Correlator," [Online].available: https://casper.berkeley.edu/wiki/Tutorial\_Wideband\_Pocket\_Correlator