|
30 Minute Interview
Intelligent architecture
Eric Demers, Team Leader - Engineering, AMD, spoke to Nivedan
Prakash about the companys teraFLOPS graphic chip, codenamed RV770,
and the chips architecture

Eric Demers
|
Shrinking the die
The secret was good engineering as we spent quite a bit of
time reviewing our previous products and a lot of engineering effort redesigning
blocks to make them more efficient and smaller, using all we had learned to
achieve that. In addition, we rebalanced compute and BW, to achieve a more balanced
ratio of capabilities to bandwidth, more in line with current applications.
We also changed the memory interface configuration, going for a more tuned per
channel/client organization for high bandwidth clients.
There was also a significant amount of layout work done to achieve the small
dies. Nearly a year before we sent the chip out for fabrication and then we
started our floor planning and physical design work. The last months of the
design are spent solely on physical design and achieving our projected area
targets. While there is quite a bit of custom work done for all our chips (for
example, all the I/O), the core design was a standard cell design for the logic
section, but with custom memories to optimize area.
Changing the dispatcher
It took nearly 1.5 years of work from a design team standpoint to make the dispatcher
more scalable while also offering new features and better performance. It is
an evolution of the previous version and inherited the best parts, like as the
ability to issue to multiple blocks in parallel such as texture and ALU and
others. We tweaked and optimized the command queues to achieve better balance
in the new design.
About GDDR5
The GDDR5 ATI Radeon HD 4870 boards are tuned to operate with higher memory
and core speeds to get the highest performance, as compared to the ATI Radeon
HD 4850 boards. As a result, they are currently more limited than the ATI Radeon
HD 4950 GDDR3 boards in terms of their ability to operate at scaled down clocks
when idle. It is a result of multiple constraints, but nothing inherent in the
GDDR5 protocol. However, we are working on ways of improving the range of clock
speeds we can support with GDDR5 boards, so we can further reduce idle power
without affecting peak performance. Currently, the ATI Radeon HD 4870 boards
have an idle power in the typical range for a performance board.
Micro-stuttering
Micro stuttering can be caused by multiple things. For example,
for our previous product, the ATI Radeon HD 3870, one of the causes of micro
stuttering was because the graphics clock was being increased and decreased
too frequently, during games. The ATI Radeon HD 3870 was one of the first AMD
parts to introduce a programmable micro-controller to monitor and control the
chip power through clocks and voltage. The ATI Radeon HD 3870 was able to detect
times when the application was not using it, and reduce its clock speed to conserve
power. What we found is that within a single frame, when the CPU load was high,
there were times where there was enough starvation to cause the
ATI Radeon HD 3870 to reduce its clock, even though it was running a game. When
the next part of the frame came up, the graphics clock had already been reduced,
so that the rendering was slowed down until the chip detected a heavy load and
resume high clocks. This up/down on the clock saved power, but reduced overall
performance and cause micro stuttering.
There are other potential sources for micro stuttering. Some of them, for example,
have to do with moving memory around, which can cause blackouts either for the
CPU or the GPU. Others exist when the CPU and GPU are more unbalanced (fast
GPU, slow CPU), for example, where the CPU will not generate any frames for
a while, then generate two frames. It could be that in that case, we get an
average time for frame 1, which is the idle time plus render, while frame 2
will be only render. That could lead to 16ms and 1ms frame times, which would
appear as stuttering (assuming 15ms idle, 1ms render times). Multi-GPU makes
the problem worst, as the GPU consumption rate is even higher. We are investigating
these and others, though it is a tall task to fix all of them while also achieving
peak performance.
|