Прислано pico 10-01-2011 22:59
It'll be Sandy Bridge against Bulldozer in 2011
Analysis The tightest Intel versus AMD performance battle in a long time
By Nebojsa Novakovic
Tue Aug 31 2010, 14:36
OVER THE PAST FEW WEEKS more details about Intel's and AMD's next microarchitectures - Sandy Bridge and Bulldozer - have become clear. Including, for the first time, the high end parts slated for deep into 2011, likely mid-year.
While the mainstream LGA1156 Sandy Bridge is fairly clear by now, down to the model numbers and expected performance, its top notch brethren in the brand new Socket LGA2011, aptly named to match the release year, have far more impressive specifications.
Eight full cores - no sharing of FPUs and such, but eight true full cores - 20MB of shared L3 cache, and quad channel DDR3 memory on a single 32nm process die, and with clock rates similar to the quad-core Sandy Bridge parts at launch, bring out a possible performance monster. If you estimate an average 15 per cent clock for clock performance boost per core - and that is without using AVX instruction extensions - plus two more cores and at least 5 per cent higher clock speed compared to the current 3.33GHz top end LGA1366 processors like the Core i7 980X and Xeon X5680, you'll easily get over half extra peak performance right at the launch.
Even if we include the year-end expected Core i7 990X and Xeon X5690 3.46GHz part speed bin step-up for the current Westmere generation, the new chips will still have at least the same clock speeds to start out with. And, looking at the 3.4GHz starting speed bin for the initial Core i7 2600 quad-core Sandy Bridge part, I am inclined to expect at least 3.6GHz launch speed for the octo-core high end parts two quarters later.
As an example, a 3.6GHz highest end Sandy Bridge based dual Xeon workstation would, with its 16 total cores and AVX set, be able to churn out an astonishing 460GFLOPs in double precision floating-point, compared to roughly 160GFLOPs on a dual Westmere X5680 3.33GHz Xeon without AVX extensions. If the 3.8GHz Turbo mode kicks in across all cores, we'll be quite close to a peak half teraflops on a desktop. Not bad at all, and it should provide something for the computational GPU crowd to think about.
The benefits of extra DRAM bandwidth and capacity via four DDR3 channels, all fed through a humongous 20MB L3 cache, should be felt especially in memory and cache intensive codes, as many more loops will fit within the enlarged cache without much outside traffic. On top of that, the Sandy Bridge designers have optimised the L3 cache latencies, too.
On the other side, AMD also has a new horse to show off. The Bulldozer-based Interlagos replacement for Magny Cours, with a total of eight dual-core blocks, provides for 16 integer cores with eight shared floating-point units.
While the AMD intended approach was to enable the common thread pairs - normally one integer and floating-point, and another integer-only - to be paired nicely across such cores without wasting the die size, it could impact scientific apps where all cores might be loaded with floating-point tasks. Since the single die 4-block 8-core Bulldozer should run at 3.2GHz and above clocks, similar to the current six-core Phenom or Opteron, the dual die 8-block 16-core Interlagos shouldn't be far behind, probably around 2.6GHz at start.
Keeping in mind the well publicised scheduling and execution path improvements and around 60 per cent expected performance boost when going from 2.3GHz Magny Cours to 2.5GHz Interlagos - or extrapolate it to a similar 60 per cent performance gain from the 3.2GHz Phenom II X6 to the 3.5GHz Bulldozer part - AMD should be on the route back to the performance race this time.
There is another potential boost for AMD here, which might have passed forgotten in the mists of time. A few years ago, AMD was toying with a kind of 'reverse multithreading' approach, where instead of two threads sharing a single core like in typical multithreading, there was a consideration to enable one very resource demanding thread to be able to share two cores. The otherwise fairly complex problem becomes much simpler now, if the two integer cores within each Bulldozer block share the same instruction fetch logic.
Why is it important? Well, looking at things the usual way Intel would still hold the per-core and per-thread performance lead. The expected LGA2011 Sandy Bridge simply won't have competition in that realm yet. However, if AMD was to somehow enable reverse multithreading on the Bulldozer, allowing a single thread to use all of the dual-core block resources at once, we might, for the first time in a while, see AMD take the lead in per-thread performance too, especially the integer-rich ones.
The question is, will that happen at launch? We still have over half a year to figure it out. Either way, the performance competition will be the most interesting in years.