Sarsen Technology
Sarsen Technology home products and services search by manufacturer about us contact us Sarsen Technology Sarsen Technology

TigerSHARC, SHARC and FPGA Digital Signal Processing COTS Solutions

BittWare Tiger 6U versus PowerPC G4 6U

The 6U CompactPCI Tiger board from BittWare, based on the 250 MHz ADSP TS101S TigerSHARC, offers substantial benefit to customers over PowerPC G4 based boards. While a 500 MHz G4 may have a higher peak performance than the 250 MHz TigerSHARC TS101S, this article will show that BittWare’s Tiger 6U beats a G4 6U on virtually all measures, including price per processing unit, power per processing unit, and processing units per slot, even when only considering peak performance. When one takes into account the more realistic sustainable performance achievable, it’s easy to see why the Tiger is a better choice for high performance multiprocessor DSP systems.

Processing Units

In order to make the comparison at a board level, we will use the concept of Processing Units (PU). For an easy comparison, let’s use the 500 MHz G4 as the baseline for a processing unit, so that 1 G4 = 1 PU.

Peak Performance

Under peak performance, a highly optimized G4 generally performs standard routines at 2 to 2.5 times that of the 250 MHz TigerSHARC. While some comparisons have shown ratios much higher than this for certain routines, it is a pretty safe assumption that once the Tiger routines have been fully optimized, as the benchmark 1K complex FFT has, the ratios will generally come in at around 2:1. This makes perfect sense given the fact that the clock speed of the G4 is twice that of the TigerSHARC. Comparisons showing huge ratios, like 20:1, are most certainly attributable to differences in the implementation methods of the routines as opposed to differences in the processors themselves, meaning that the TigerSHARC routine could be rewritten to use the same method as the G4 and achieve a performance that is more in line with the 2:1 ratio. Some early TigerSHARC performance numbers have been used assuming that the TigerSHARC will require the same number of cycles as the ADSP 21160, which may be valid for some routines, but other routines will certainly benefit from additional features present in the TigerSHARC.

While we expect that more Tiger optimization will generally show a ratio of 2:1 or better, for now we will be generous to the G4 and use a ratio of 2.5:1 for peak performance.

Realistic Performance

For more realistic performance it is very difficult to come up with hard data, as so much depends on the system application. The Tiger, with three 128 bit wide internal buses cross-connected to three internal memory blocks, 14 DMA engines, a 64 bit multiprocessor external bus, and four 250 MByte/second link ports per processor, is designed to move huge amounts of data around quickly without interfering with the actual core doing the number crunching. One can design system applications on the Tiger to take advantage of this architecture, and can achieve sustainable performance that meets or comes near the TigerSHARC’s peak performance. Note that the ability to sustain peak performance has always been a key advantage of the SHARC architecture, and the Tiger continues this tradition.

The G4 does not lend itself well to turning peak performance into sustainable performance. These peak numbers are generally based on having data in the right place at the right time and do not take into account the time to get the data where it needs to be. Not only does the G4 not provide the tremendous data movement capabilities of the Tiger, it is very hard to predict exactly what performance levels will be achieved, thus forcing a system designer to have to be conservative when trying to specify how many processing units will be required for a given system implementation. Realistic system designers understand that managing data movement in a system is critical, and that peak performance numbers, while providing an indication of the processing power, do not provide realistic measures of what can be achieved in a real system environment.

Some initial testing using more realistic system application situations has shown a G4 to TigerSHARC ratio of around 1.25:1. Obviously key to these comparisons is the amount of data being flowed into and out of the processor, since the Tiger will handle this data flow virtually transparently and in a deterministic manner, while the G4’s performance will suffer as the data rates increase.

For the rest of this study, we will use both the 2.5:1 and 1.25:1 ratios for comparing the Tiger and G4.

Price Per Processing Unit

The Tiger 6U baseboard provides 8 TS101 TigerSHARCs at a cost of $11,000, with options using Tiger PMCs for 12 TigerSHARCs for $15,000 and 16 TigerSHARCs for $18,000 (costs are based on hundreds of pieces). The G4 6U provides 4 PowerPC G4s at a cost estimated at $18,000. Therefore, the cost per processing unit is:

Tiger 6U: $1406, $1562, or $1718 per PU (16, 12, or 8 TS101s at 1.25:1 ratio);

$2812, $3125, or $3437 per PU (16, 12, or 8 TS101s at 2.5:1 ratio)

G4 6U: $4500 per PU

As you can see, the Tiger 6U beats the G4 6U in price per processing unit in all configurations, and provides the same realistic processing performance for 1/3 of the cost.

Power Per Processing Unit

The Tiger 6U is estimated at 25 Watts, while the G4 6U is 50 Watts, giving a power per processing unit number of:

Tiger 6U: 3.9 Watts per PU (1.25:1); 7.8 Watts per PU (2.5:1)

G4 6U: 12.5 Watts per PU

The Tiger 6U provides processing for substantially less power than the G4 6U.

Processing Units Per Slot

The Tiger 6U provides 8 TigerSHARCs on the baseboard, and can add two Tiger PMCs with 4 more TigerSHARCs each, for a choice of 8, 12, or 16 TigerSHARCs per slot. The G4 6U has 4 G4s.

Tiger 6U: 12.8, 9.6 or 6.4 PU per slot (1.25:1)

6.4, 4.8, or 3.2 PU per slot (2.5:1)

G4 6U: 4 PU per slot

The Tiger 6U can provide more peak performance per slot and up to 3 times the realistic processing power.

Other Considerations

The TigerSHARC is designed for large, high bandwidth, multiprocessor systems. The balance of core processing power, data movement engines, high bandwidth internal buses, large internal memories, and high speed link ports, combine to provide a deterministic, high performance signal processing engine. The BittWare Tiger 6U takes advantage of the TigerSHARC architecture to provide a board level solution with signal processing and data movements capabilities that can be applied to large signal processing tasks. A large advantage of the Tiger over the G4 are the TigerSHARC’s 250 Mbytes/second link ports. Combined with the 64 bit 66 MHz PCI buses with bridging for onboard bus segmentation, the Tiger 6U has tremendous data movement capabilities. The link ports allow you to design a system where the real time data flows over the links, leaving the PCI bus and TigerSHARC external buses for command and control and background data movement.

Summary

BittWare’s Tiger 6U beats G4 6U boards in price, power, and density, even if one only considers peak performance. Using more realistic performance assessments, the Tiger 6U far outperforms the G4 6U. These performance measurements do not take into account the large advantages the Tiger offers in terms of its deterministic performance, and its ability to separate the system data flows from the signal processing.

Author: Jeff Milrod, President, Bittware Inc.

Sarsen Technology
©2000 - 2006 Sarsen Technology Limited
updated 05 January 2006