Is Bulldozer module dual-core, or single-core with Hyper-Threading on steroids?
Kapitoly článků
There are two big differences of Core Parking in Windows 7 and Windows 8 (publicly released Developer Preview). The first one is that Windows 8 is aware of Bulldozer modules and can park them correctly. Windows 7 doesn't know the Bulldozer at all and considers all cores as the same while the aren't (pairs of cores shares lots of stuff in one Bulldozer module which penalizes the performance of one core if the other is also working). That paradoxically helps the FX processor with default setting of the most recent production version of Windows (7) and penalizes Core i7 (as shown on the previous page using WinRAR).
We still don't know what's more to the reality: to call AMD FX-8xxx series processor an eight core or a quad core with some variation of Hyper-Threading on steroids. We concluded that it simply depends on application and it's not really possible to strictly call the Bulldozer module either single-core or dual-core. And we really don't like to call it sort of one-and-half-core, since the core just is, or is not there. If we would call it one-and-half-core then we wouldn't be able to call the good old Core 2 Duo strictly as dual-core either, since the two cores are also sharing something: the L2 cache. It depends on what you consider as a core (or the essentials of core). If it's just ALU (without L2 cache, because L1 caches can be considered as the essential part of a core while older processors did not have the L2 cache integrated), then AMD FX-8150 really is eight-core processor. Our opinion is that it should be taken as the AMD marketing says. If it is sold as eight-core then it should be evaluated as an eight core. And AMD FX-8xxx series processor's performance is poor considering it as an eight core.
When we analyzed if the processor is eight core, or quad core with HT on steroids, we used modified x264 HD Benchmark (originally designed by Tech ARP guys, we just used another video and newer build of x264 encoder). Our script is designed to run the benchmark 3 times: first with all cores/threads assigned to the application, second with only even threads assigned and third with the first four threads assigned to the encoder. You can easily imagine visible processor threads as digits in binary representation of the number 255, which is 11111111. First run of the test used all cores, second used only cores 10101010 (i.e. four cores without Hyper-Threading in case of Core i7 and four individual cores in all four Bulldozer modules in case of AMD FX) and third run used cores 11110000 (i.e. two cores including Hyper-Threading in case of Core i7 and two Bulldozer modules including all cores in case of AMD FX). The two latter situations are clearly visible on the screenshot of performance monitor in Windows 7 (can be easily run by the command
Utilized cores/threads when setting affinity to 10101010 (left) and 11110000 (right)
When the Core Parking is OFF, it works as expected in both Windows 7 and Windows 8 with both Core i7 and AMD FX processors. Results are shown in the chart below (Turbo enabled with both processors). The shorter part of the graph is second pass of the encoding while the longer represents first pass (less important for the time consumed during the encoding process). This chart comes from our original Czech-written article about analyzing the Bulldozer µarch with Core i7 870 (Lynnfield) results removed (for the chart to be more synoptic).
As you can see on the 1st pass encoding results the FX processor behaves more like quad-core with something like Hyper-Threading with quite small influence. On the opposite with the 2nd pass the FX processor behaves more like real eight-core processor. There is not much difference between running the task on four single-core or two dual-core Bulldozer modules. If we would like to express that in percentage improvements over the previous situation, we would say something like that (for Windows 7):
1st pass
- AMD FX-8150: 2 dual-core modules: base / 4 single-core modules: +14% / 8 cores: +5% (total +19%)
- Core i7-2600K: 2 cores with HT: base / 4 cores without HT: +48% / 4 cores with HT −9% (total +35%)
2nd pass
- AMD FX-8150: 2 dual-core modules: base / 4 single-core modules: +15% / 8 cores: +40% (total +61%)
- Core i7-2600K: 2 cores with HT: base / 4 cores without HT: +50% / 4 cores with HT +22% (total +82%)
It's interesting to see that with Core i7 the 1st pass is faster on 4 cores without the assistance of Hyper-Threading than with the usage of Hyper-Threading. With AMD FX there is a single rule: the more cores/threads the better.