|
BittWare Tiger 6U versus PowerPC G4 6U
The 6U CompactPCI Tiger board from BittWare, based
on the 250 MHz ADSP TS101S TigerSHARC, offers substantial benefit
to customers over PowerPC G4 based boards. While a 500 MHz G4 may
have a higher peak performance than the 250 MHz TigerSHARC TS101S,
this article will show that BittWare’s Tiger 6U beats a G4 6U on virtually
all measures, including price per processing unit, power per processing
unit, and processing units per slot, even when only considering peak
performance. When one takes into account the more realistic sustainable
performance achievable, it’s easy to see why the Tiger is a better
choice for high performance multiprocessor DSP systems.
Processing Units
In order to make the comparison at a board level, we will use the concept of
Processing Units (PU). For an easy comparison, let’s use the 500 MHz G4 as the baseline for a processing
unit, so that 1 G4 = 1 PU.
Peak Performance
Under peak performance, a highly optimized G4 generally performs standard routines at
2 to 2.5 times that of the 250 MHz TigerSHARC. While some comparisons have shown ratios much higher than
this for certain routines, it is a pretty safe assumption that once the Tiger routines have been fully
optimized, as the benchmark 1K complex FFT has, the ratios will generally come in at around 2:1. This
makes perfect sense given the fact that the clock speed of the G4 is twice that of the TigerSHARC.
Comparisons showing huge ratios, like 20:1, are most certainly attributable to differences in the
implementation methods of the routines as opposed to differences in the processors themselves, meaning
that the TigerSHARC routine could be rewritten to use the same method as the G4 and achieve a
performance that is more in line with the 2:1 ratio. Some early TigerSHARC performance numbers have
been used assuming that the TigerSHARC will require the same number of cycles as the ADSP 21160,
which may be valid for some routines, but other routines will certainly benefit from additional features
present in the TigerSHARC.
While we expect that more Tiger optimization will generally show a ratio of 2:1 or
better, for now we will be generous to the G4 and use a ratio of 2.5:1 for peak performance.
Realistic Performance
For more realistic performance it is very difficult to come up with hard data, as so
much depends on the system application. The Tiger, with three 128 bit wide internal buses cross-connected
to three internal memory blocks, 14 DMA engines, a 64 bit multiprocessor external bus, and four 250
MByte/second link ports per processor, is designed to move huge amounts of data around quickly without
interfering with the actual core doing the number crunching. One can design system applications on the
Tiger to take advantage of this architecture, and can achieve sustainable performance that meets or
comes near the TigerSHARC’s peak performance. Note that the ability to sustain peak performance has always
been a key advantage of the SHARC architecture, and the Tiger continues this tradition.
The G4 does not lend itself well to turning peak performance into sustainable
performance. These peak numbers are generally based on having data in the right place at the right time
and do not take into account the time to get the data where it needs to be. Not only does the G4 not
provide the tremendous data movement capabilities of the Tiger, it is very hard to predict exactly what
performance levels will be achieved, thus forcing a system designer to have to be conservative when trying
to specify how many processing units will be required for a given system implementation. Realistic system
designers understand that managing data movement in a system is critical, and that peak performance
numbers, while providing an indication of the processing power, do not provide realistic measures of what
can be achieved in a real system environment.
Some initial testing using more realistic system application situations has shown a
G4 to TigerSHARC ratio of around 1.25:1. Obviously key to these comparisons is the amount of data being
flowed into and out of the processor, since the Tiger will handle this data flow virtually transparently
and in a deterministic manner, while the G4’s performance will suffer as the data rates increase.
For the rest of this study, we will use both the 2.5:1 and 1.25:1 ratios for comparing
the Tiger and G4.
Price Per Processing Unit
The Tiger 6U baseboard provides 8 TS101 TigerSHARCs at a cost of $11,000, with options
using Tiger PMCs for 12 TigerSHARCs for $15,000 and 16 TigerSHARCs for $18,000 (costs are based on
hundreds of pieces). The G4 6U provides 4 PowerPC G4s at a cost estimated at $18,000. Therefore, the cost
per processing unit is:
Tiger 6U: $1406, $1562, or $1718 per PU (16, 12, or 8 TS101s at 1.25:1 ratio);
$2812, $3125, or $3437 per PU (16, 12, or 8 TS101s at 2.5:1 ratio)
G4 6U: $4500 per PU
As you can see, the Tiger 6U beats the G4 6U in price per processing unit in all
configurations, and provides the same realistic processing performance for 1/3 of the cost.
Power Per Processing Unit
The Tiger 6U is estimated at 25 Watts, while the G4 6U is 50 Watts, giving a power
per processing unit number of:
Tiger 6U: 3.9 Watts per PU (1.25:1); 7.8 Watts per PU (2.5:1)
G4 6U: 12.5 Watts per PU
The Tiger 6U provides processing for substantially less power than the G4 6U.
Processing Units Per Slot
The Tiger 6U provides 8 TigerSHARCs on the baseboard, and can add two Tiger PMCs with
4 more TigerSHARCs each, for a choice of 8, 12, or 16 TigerSHARCs per slot. The G4 6U has 4 G4s.
Tiger 6U: 12.8, 9.6 or 6.4 PU per slot (1.25:1)
6.4, 4.8, or 3.2 PU per slot (2.5:1)
G4 6U: 4 PU per slot
The Tiger 6U can provide more peak performance per slot and up to 3 times the realistic
processing power.
Other Considerations
The TigerSHARC is designed for large, high bandwidth, multiprocessor systems. The
balance of core processing power, data movement engines, high bandwidth internal buses, large internal
memories, and high speed link ports, combine to provide a deterministic, high performance signal
processing engine. The BittWare Tiger 6U takes advantage of the TigerSHARC architecture to provide a board
level solution with signal processing and data movements capabilities that can be applied to large signal
processing tasks. A large advantage of the Tiger over the G4 are the TigerSHARC’s 250 Mbytes/second
link ports. Combined with the 64 bit 66 MHz PCI buses with bridging for onboard bus segmentation, the
Tiger 6U has tremendous data movement capabilities. The link ports allow you to design a system where the
real time data flows over the links, leaving the PCI bus and TigerSHARC external buses for command and
control and background data movement.
Summary
BittWare’s Tiger 6U beats G4 6U boards in price, power, and density, even if one only
considers peak performance. Using more realistic performance assessments, the Tiger 6U far outperforms the
G4 6U. These performance measurements do not take into account the large advantages the Tiger offers in
terms of its deterministic performance, and its ability to separate the system data flows from the signal
processing.
Author: Jeff Milrod, President, Bittware Inc.
|