Jump to content
SceptreCore

AMD conFusion? Forget the hype!

Recommended Posts

there was also talk of a cache flogging bug of some kind in the thread which may be the reason for the larger than expected performance difference show

Share this post


Link to post
Share on other sites

Well either way it seems that the architecture is great for properly threaded applications (what it was designed for) but the single threaded NEEDS improvement I understand that that was not necessarily the design philosophy that they had in mind but I think that it follows that if they can improve the single core performance then the multi-threaded performance will increase dramatically also.

Share this post


Link to post
Share on other sites

...Correct me if I'm wrong but it seems to have proven that for single threaded applications using a the CPU as a 4module/4Core works far better then using it as a 2module/4core which I think is pretty intuitive...

I'd agree with that

 

I am surprised there is no patch for these. AMD did have a dual-core optimiser in for XP, and I noticed a decent boost when I installed that years ago

Share this post


Link to post
Share on other sites

I didn't post this earlier because there is nothing substantial in it. But I reconsidered, it a small but interesting read.

 

One of the AMD Linux engineering systems for Trinity is running nicely even on Ubuntu 11.04 with the Linux 2.6.38 kernel. The CPU string is AMD Eng Sample 2M252057C4450_32/25/16_9900_609 and its graphics are the Trinity Devastator Mobile with 512MB of video memory and an AMD Pumori motherboard. The PCI ID on the Trinity Devastator appears to be 0x9900. This Trinity APU is quad-core and running at 2.50GHz. The current quad-core Llano offerings are clocked at 2.6GHz (A6-3650) and 2.9GHz (A8-3850), while this Trinity part is clocked slower, it's numbers are nice compared to my A8-3850 Linux system.

 

Taken from http://www.phoronix.com/scan.php?page=arti...early&num=1

 

So if this engineering sample has a 400mhz deficit and is quad core and getting similar numbers to a quad core stars. Then the 8 core version of it should be able to put considerable distance between itself and phenom X6 even at a lower clock speed. But we only know these numbers are 'nice'.

 

I had a try to find them on the web in a cache. but could not, maybe someone who has more experience with hunting down cached pages can find the numbers.

Share this post


Link to post
Share on other sites

Ahh, that is nicely found DASA, and TBH something that reviewers should have 'gotten' much earlier.

 

The (dirty Wales beating bastard of a) French review is rather enlightening - if I'm reading it right it says only games are restricted in terms of the number of cores they use (excluding ARMA2 and Starcraft) and there's an average 4.7% improvement (in game performance from CPU, which implies a greater jump in CPU performance) in those games that can't fully utilise 8 threads.

 

The XS.org tests seem to show the 'maximum' performance increase you can get a correct utilisation of Bulldozer (moving from 2 modules fully utilised to 4 modules with only one 'core' per module utilised) and should be broadly applicable to performance improvements from an 'optimiser' in 2, and 4 threaded apps - and critically they average a 23.9% performance improvement. So for 1, 2, and 4 threads, we should be looking at a ~24% increase from correct core utilisation, which iirc brings it broadly up to spec with Thuban? EDIT: probably not, as single core performance was the issue and it doesn't look like that would be affected as below.

 

Given multithreaded performance isn't too bad a 'core optimiser' could give Bulldozer a shot in the arm. Whomever released Bulldozer without some kind of OS/BIOS patch or third-party software optimiser, likewise deserves a shot in the arm. I agree with DASA that this is still 'not quite enough' to make Bulldozer truly competitive but it does look like quite a decent improvement...

 

 

The big unanswered question is what happens with an odd number of threads (particularly ONE thread) - I'm assuming with one thread in a module, the Bulldozer module runs itself as a single-core module, which would mean no performance gain, or does the scheduling unit not interpret that correctly there too?

Edited by philo-sofa

Share this post


Link to post
Share on other sites

The big unanswered question is what happens with an odd number of threads (particularly ONE thread) - I'm assuming with one thread in a module, the Bulldozer module runs itself as a single-core module, which would mean no performance gain, or does the scheduling unit not interpret that correctly there too?

I'm guessing the scheduler will automatically assign that to the next free module. So think of it as a teller system at a bank, one instruction per module. If odd, the instruction will probably wait until the whole module is free or the partial module is finished with the previous instruction.

Share this post


Link to post
Share on other sites

The big unanswered question is what happens with an odd number of threads (particularly ONE thread) - I'm assuming with one thread in a module, the Bulldozer module runs itself as a single-core module, which would mean no performance gain, or does the scheduling unit not interpret that correctly there too?

I'm guessing the scheduler will automatically assign that to the next free module. So think of it as a teller system at a bank, one instruction per module. If odd, the instruction will probably wait until the whole module is free or the partial module is finished with the previous instruction.

 

Yeah, that's almost certainly the case eh. Looking over the 1 thread benchmarks, Bulldozer actually pulls very close to the performance of Thuban (labeit at slightly higher clocks), rather than lagging way behind it as it does in tests where it's multi-threading a module, but not using all four modules. Edited by philo-sofa

Share this post


Link to post
Share on other sites

The big unanswered question is what happens with an odd number of threads (particularly ONE thread) - I'm assuming with one thread in a module, the Bulldozer module runs itself as a single-core module, which would mean no performance gain, or does the scheduling unit not interpret that correctly there too?

I'm guessing the scheduler will automatically assign that to the next free module. So think of it as a teller system at a bank, one instruction per module. If odd, the instruction will probably wait until the whole module is free or the partial module is finished with the previous instruction.

 

Yeah, that's almost certainly the case eh. Looking over the 1 thread benchmarks, Bulldozer actually pulls very close to the performance of Thuban (labeit at slightly higher clocks), rather than lagging way behind it as it does in tests where it's multi-threading a module, but not using all four modules.

 

When I first read the reviews, all of the problems point towards the schedulers they had in the CPU. If Piledriver can tweak these and optimise it, then it'll pull away from Thurban. That's probably the only weak spot I think Bulldozer needs desperately fixing. Also, increasing the Lv1 cache would be a nice bonus too. :)

 

EDIT: Also, looking at the various benchmarks, there are several applications that can and do utilise Bulldozer a lot better than gaming ones such as Photoshop and others. I'll be interested as I'm working with Photoshop a lot.

Edited by sora3

Share this post


Link to post
Share on other sites

The big unanswered question is what happens with an odd number of threads (particularly ONE thread) - I'm assuming with one thread in a module, the Bulldozer module runs itself as a single-core module, which would mean no performance gain, or does the scheduling unit not interpret that correctly there too?

I'm guessing the scheduler will automatically assign that to the next free module. So think of it as a teller system at a bank, one instruction per module. If odd, the instruction will probably wait until the whole module is free or the partial module is finished with the previous instruction.

 

Yeah, that's almost certainly the case eh. Looking over the 1 thread benchmarks, Bulldozer actually pulls very close to the performance of Thuban (labeit at slightly higher clocks), rather than lagging way behind it as it does in tests where it's multi-threading a module, but not using all four modules.

 

When I first read the reviews, all of the problems point towards the schedulers they had in the CPU. If Piledriver can tweak these and optimise it, then it'll pull away from Thurban. That's probably the only weak spot I think Bulldozer needs desperately fixing. Also, increasing the Lv1 cache would be a nice bonus too. :)

 

EDIT: Also, looking at the various benchmarks, there are several applications that can and do utilise Bulldozer a lot better than gaming ones such as Photoshop and others. I'll be interested as I'm working with Photoshop a lot.

 

If I recall correctly, the L1 isn't as small as reviewers have been saying. I believe it L1 is split into different types. There's 16KB of what they've called L1D. 2 way associative. Then 64KB of 2 way associative.

Share this post


Link to post
Share on other sites

Interesting talk on how Bulldozer might work once the scheduling in the OS is more cooperative - it appears they intend to optimise for power efficiency not necessarily speed. Wonder if that's still the plan:

 

According to AMD, Windows 8 will more intelligently align threads so that, when they can benefit from sharing a module, they will. The implication is that when two threads can be consolidated onto one module (despite the fact that they’re forced to share resources), putting an entire module to sleep and potentially enabling a higher p-state (a faster Turbo Core setting) outweighs any performance penalty tied to sharing.

Posted Image

Posted Image

 

 

(Source)

Share this post


Link to post
Share on other sites

Well that is interesting and there was a fps increase with a better scheduler, even though it is only in beta stages. Clearly the single threaded performance is still not up to scratch but piledriver may even the field a bit with it's improvements combined with improved scheduler.

 

EDIT

Did anyone see this review? http://www.hardwareheaven.com/reviews/1285...troduction.html the results seem to contradict every other benchmark I've seen... Better in gaming and worse in media encoding --- Cherry picked tests? and 5.2ghz at 1.5v (I noticed that there were no benchmarks at that speed though)

Edited by UberPenguin

Share this post


Link to post
Share on other sites

For those that are going to purchase Bulldozer, our supplier at work have just got there first stock of them in, so they aren't very far off now :D

Share this post


Link to post
Share on other sites

For those that are going to purchase Bulldozer, our supplier at work have just got there first stock of them in, so they aren't very far off now :D

I assume the Aussie (and NZ) tax still appllies.

Share this post


Link to post
Share on other sites

Well that is interesting and there was a fps increase with a better scheduler, even though it is only in beta stages. Clearly the single threaded performance is still not up to scratch but piledriver may even the field a bit with it's improvements combined with improved scheduler.

 

EDIT

Did anyone see this review? http://www.hardwareheaven.com/reviews/1285...troduction.html the results seem to contradict every other benchmark I've seen... Better in gaming and worse in media encoding --- Cherry picked tests? and 5.2ghz at 1.5v (I noticed that there were no benchmarks at that speed though)

what the heck is going on here?

 

"Overall we were impressed by the new Bulldozer chip, it shares the same external design and build quality as previous AMD processors which of course means we can drop it into many existing motherboards creating an easy upgrade path to the latest technology. In terms of the internal design the decisions made by AMD offer a design which will see our processor perform better as more applications support its many cores. This also includes Windows 8 which AMD have indicated will offer performance enhancements over 7 due to the way the OS handles multi-threaded tasks."

 

slower than the 2600k, but no where near as slow as the other reviews have shown. the power consumption still remains the same. +1 to the cherry picked theory.

Edited by alkahest

Share this post


Link to post
Share on other sites

For those that are going to purchase Bulldozer, our supplier at work have just got there first stock of them in, so they aren't very far off now :D

I assume the Aussie (and NZ) tax still appllies.

 

No idea. All I know is that they have got stock. I don't even know which specific chips they got in, let alone the price. Will be posting once I get more info :D

Share this post


Link to post
Share on other sites

but no where near as slow as the other reviews have shown.

Don't most sites use the same benchmarks/tests for each review? I don't see how that could be cherry picking.

Share this post


Link to post
Share on other sites

but no where near as slow as the other reviews have shown.

Don't most sites use the same benchmarks/tests for each review? I don't see how that could be cherry picking.

 

There are a lot of settings for benches (especially game benches) that one could choose in order to cherry pick results.

Share this post


Link to post
Share on other sites

but no where near as slow as the other reviews have shown.

Don't most sites use the same benchmarks/tests for each review? I don't see how that could be cherry picking.

 

There are a lot of settings for benches (especially game benches) that one could choose in order to cherry pick results.

 

its just another review with a gpu bottleneck nothing more

 

i could do a review that shows a p4 to be just as quick as a 2600k if it had enough of a gpu bottleneck in the benchmarks run

Edited by Dasa

Share this post


Link to post
Share on other sites

At least they used HD6950 instead of GTX285 like some other review.

Edited by Jeruselem

Share this post


Link to post
Share on other sites

At least they used HD6950 instead of GTX285 like some other review.

then they made sure they enabled 8xaa when needed to keep it close on f1

Share this post


Link to post
Share on other sites

but no where near as slow as the other reviews have shown.

Don't most sites use the same benchmarks/tests for each review? I don't see how that could be cherry picking.

 

There are a lot of settings for benches (especially game benches) that one could choose in order to cherry pick results.

 

its just another review with a gpu bottleneck nothing more

 

IMO its a little more than that as they're also showing the FX-8150 as being ahead in quite a few cases - they've either massaged the results or deliberately cherry-picked settings?

Share this post


Link to post
Share on other sites

perhaps since in multiple runs of a gpu benchmark you can expect to see ~3% variance

maybe they ran it till they got one they liked but it would be easier to just cheat outright

maybe amd just got lucky in those tests

or maybe the amd platform can perform a bit better in some gpu limited situations

Edited by Dasa

Share this post


Link to post
Share on other sites

perhaps since in multiple runs of a gpu benchmark you can expect to see ~3% variance

maybe they ran it till they got one they liked but it would be easier to just cheat outright

maybe amd just got lucky in those tests

or maybe the amd platform can perform a bit better in some gpu limited situations

I'll probably be getting the 8150. If a few more get some BD chips... then us Atomicans can do a little review of our own... and we can make sure of all settings and results.

Share this post


Link to post
Share on other sites

perhaps since in multiple runs of a gpu benchmark you can expect to see ~3% variance

maybe they ran it till they got one they liked but it would be easier to just cheat outright

maybe amd just got lucky in those tests

or maybe the amd platform can perform a bit better in some gpu limited situations

I'll probably be getting the 8150. If a few more get some BD chips... then us Atomicans can do a little review of our own... and we can make sure of all settings and results.

 

I'm looking forward to seeing some in house benchmarks. But it might be a while before more people pick one up. I remember when i got my 2600k i was pretty much the only forum member for months. But yes, in house benchmarks will be a very good idea!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×