HardwareBanter - View Single Post

**Jure Sah[_2_]** · July 20th 08, 07:24 PM posted to alt.comp.hardware.amd.x86-64

Miles Bader pravi:
The big area where current intel systems really kill amd systems seem
to be bloated cache-unfriendly apps that can really get a boost from
intel's huge L2 caches.

I wouldn't think so. L2 cache is just another marketing trick, just like
HyperThreading or Quadcore or NetBurst's high frequencies. They're
things that look good on the outside, they're what people think is
important, and do nothing on the inside.

For example if Intel ever really cared about cache, they wouldn't have
picked L2 which is really slow compared to L1. And AMD has far more L1
cache than Intel.

L2 cache was originally invented in order to be a big number that can be
shown to the costumers, it was originally introduced as a feature with
the Pentium 2 slot processor, where the L2 cache was sitting on a card
next to the CPU and was not actually any faster than the EDO RAM on the
motherboard (benchmarked with memtest)... and guess what, it had 2 megs
of it. Why in the world would anyone ever do something like that other
than to fool people with a cheap trick? Intel's current CPUs have lots
of cache for the same reason: it's a big number and people fall for big
numbers.

For another example, since Apple is even worse at this than Intel,
remember the talks about how much CPU cache the PowerPC chip had? How it
wisely didn't implement any special functions in the CPU and ingeniously
used the remaining space for cache instead? Well turns out all of it was
used to cache CPU microcode which implemented the missing functions in
the Apple computers, so none of the cache was actually used to cache
programs or their data. That didn't stop people from constantly talking
about how large the cache was and how superior that was, did it?

Lesson learned: Don't go to computer shop assistants and gamers for your
benchmark data and recommendations, these people are clueless and their
combined knowledge is nothing more than a bunch of commercial ads that
they have read and believe out of hand.

The reason for AMD's troubles is probably more complex. I have an idea,
not saying this is the case, but it makes a lot of sense.

In CPU design one of the first things you find in the book is that the
CPU must implement as much functionality in hardware and as little in
microcode as to minimize the number of cycles needed to execute a
specific command. AMD likes this principle a lot and has designed all of
it's CPUs according to it, check the instruction timings and you will
see that the vast majority of the instructions take 1 cycle to execute,
combine that with the point that the K7+ chips have 3 CPU pipelines and
additional FPU pipelines and you realize it can do a whole lot of stuff
within only a bunch of cycles, if only the code is organized about right.

Intel on the other hand was never much of a fan of this principle and
loaded their CPUs with complex microcode to do everything that they do.
The complex microcode obviously means that most instructions take more
than 1 cycle to complete, but they also mean that the microcode can
collect data from the code it's executing to do more intelligent
precaching, branch prediction, pipeline blocking, etc. The point is
prooven with the ia64 architecture (Itanium or whatever), where they
wanted to lock out the competition on the compiler market, by
constructing an architecture where the microcode part has to be included
in the compiler in order to produce code that runs right (Intel also
sells a compiler, not only CPUs).

Now since the benchmark of today is games and this means use of the
FPU... let me mention that the FPU is a horrible thing to have in a CPU,
it's a completely different architecture, a completely different
technology, the FPU is a stack processor like the Motorola, it's memory
is 24 and 48 bit aligned not 32 or 64, it doesn't belong in an x86 arch.
As a result a lot of the processing taking place around FPU instructions
is actually preparing the data, converting it from and into the FPU
format. You could estimate just the conversion to anything up to 70% of
the code. And here is where Intel's incorrect design has an advantage.
It can predict data transport with it's advanced microcode and feed it
into the FPU much faster, thus having an edge despite the fact that
AMD's FPU is actually faster than Intel's.

So the question about what AMD should do now is a complex one. Obviously
for them to be good at what Intel is good at they may have to do
something which is obviously flawed design. I don't think AMD will go
there, I think instead they could entirely replace the FPU with the GPU
they are integrating into the chip and then emulate FPU functionality
with it on the outside. That could work a lot better since modern GPUs
(while they don't belong in the x86 arch any more than the FPU) can do
floating point operations very quickly and in parallel.

It would be interesting but is pure speculation as of now.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIg4L2B6mNZXe93qgRApXkAJ9QE1xowjEG2pDVPvWs6/BM8FXQzwCaAkeh
TR/FtcpiX8bRg15E9msVLLI=
=QpsB
-----END PGP SIGNATURE-----