A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » AMD x86-64 Processors
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Pretty good explanation of x86-64 by HP



 
 
Thread Tools Display Modes
  #11  
Old December 6th 04, 12:30 AM
external usenet poster
 
Posts: n/a
Default

In comp.arch Tony Hill wrote:

Not that they know something the rest of the world doesn't, just that
they have access to processors that most of us do not. IBM sells them
as well, but for the time being Intel will ONLY sell them for use in
servers. Why? I really don't know.


FWIW, Dell are shipping EM64T-equipped non-Xeon P4 workstations (the
Precision 370).

-a
  #12  
Old December 6th 04, 02:38 AM
David Schwartz
external usenet poster
 
Posts: n/a
Default


"George Macdonald" wrote in message
...

Hmm and the following quote: "However, the latency difference between
local
and remote accesses is actually very small because the memory controller
is
integrated into and operates at the core speed of the processor, and
because of the fast interconnect between processors." is relevant to
another discussion here. I wish we could get a firm answer on this one.


In typical Opteron setups (2-8 CPUs, using the Opteron's build in SMP
hardware), the latency difference between local and remote memory accesses
is so small that the benefits of treating it as NUMA are typically
outweighed by the costs. Generally, you just distribute the memory evenly
and interleaved on the nodes (if you can) to avoid overloading one memory
controller channel.

DS


  #13  
Old December 6th 04, 02:44 AM
keith
external usenet poster
 
Posts: n/a
Default

On Sun, 05 Dec 2004 16:30:15 -0500, Yousuf Khan wrote:

George Macdonald wrote:
Hmm and the following quote: "However, the latency difference between local
and remote accesses is actually very small because the memory controller is
integrated into and operates at the core speed of the processor, and
because of the fast interconnect between processors." is relevant to
another discussion here. I wish we could get a firm answer on this one.


Yeah, but that's why I think AMD insists on calling their multiprocessor
connection scheme as SUMO (Sufficiently Uniform Memory Organization),
rather than NUMA. It's not worth headaching over such small differences
in latency, is basically what they're saying.


I'd say that because in small systems (less than 8 CPUs), Opterons are
coherent in hardware thus sufficiently tightly coupled to be called UMA,
as far as the user is concerned.

--
Keith
  #14  
Old December 6th 04, 02:46 AM
keith
external usenet poster
 
Posts: n/a
Default

On Sun, 05 Dec 2004 19:47:30 +0000, Patrick Schaaf wrote:

Tony Hill writes:

Not that they know something the rest of the world doesn't, just that
they have access to processors that most of us do not. IBM sells them
as well, but for the time being Intel will ONLY sell them for use in
servers. Why? I really don't know. Maybe it's just a bit too much
crow for them to eat after saying (only a bit over a year ago) that
64-bit wouldn't be useful for the desktop until the end of the year?


How much does Intel stockpile? Could it be that they have warehouses
full of already produced non-64-bit processors, and those want to be
sold at the projected prices, not thrown away?


Unsold inventory is a very bad thing indeed. The tax man isn't happy.
Stockholders aren't happy. Executives shiver.

--
Keith

  #15  
Old December 6th 04, 07:04 AM
George Macdonald
external usenet poster
 
Posts: n/a
Default

On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow wrote:

George Macdonald wrote:
On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan wrote:


I found this whitepaper from HP to be pretty good, it is surprisingly
candid, considering HP was the coinventor of the Itanium. It does a
pretty good job of explaining and summarizing the similarities and
differences between AMD64 and EM64T, and their comparison to the
Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
compatible", but IA64 is a different animal altogether.

Yousuf Khan

http://h200001.www2.hp.com/bc/docs/s.../c00238028.pdf



Hmm and the following quote: "However, the latency difference between local
and remote accesses is actually very small because the memory controller is
integrated into and operates at the core speed of the processor, and
because of the fast interconnect between processors." is relevant to
another discussion here. I wish we could get a firm answer on this one.


Not sure if this is exactly what you are looking for in the
way of a "firm answer", but the latencies in a Opteron system a

0 hops 80 ns uniprocessor (Local access)
100 ns multiprocessor (Local access, with cache snooping on other processors)
1 hop 115 ns
2 hops 150 ns
3 hops 190 ns

I couldn't find my original source for those numbers, and
the two and three hop numbers above are a little higher
than I remembered them as being. This time around I got
them from this thread:
http://www.aceshardware.com/forum?read=80030960

That thread refers to this article:
http://www.digit-life.com/articles2/amd-hammer-family/
which gives slightly different numbers for a 2 GHz Opteron
with DDR333:
Uni-processor system: 45 ns
Dual-processor system: 0-hop - 69 ns, 1-hop - 117 ns.
Four-processor system: 0-hop - 100 ns, 1-hop - 118 ns, 2-hop - 136 ns.


I don't know if any of the numbers above are for cache misses
or if they are averages that include both hits and misses.


Thanks for the data but no I guess I should have highlighted better what I
was getting at: "the memory controller is integrated into and operates at
the core speed of the processor", which is what was being
discussed/disputed in another thread.

I haven't been able to find any hard data from AMD on where the clock
domain boundaries are in the Opteron/Athlon64 but if the memory controller
is not operating at "core speed" it's now at the stage of Internet
Folklore.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
  #16  
Old December 6th 04, 07:04 AM
George Macdonald
external usenet poster
 
Posts: n/a
Default

On Sun, 05 Dec 2004 16:30:15 -0500, Yousuf Khan wrote:

George Macdonald wrote:
Hmm and the following quote: "However, the latency difference between local
and remote accesses is actually very small because the memory controller is
integrated into and operates at the core speed of the processor, and
because of the fast interconnect between processors." is relevant to
another discussion here. I wish we could get a firm answer on this one.


Yeah, but that's why I think AMD insists on calling their multiprocessor
connection scheme as SUMO (Sufficiently Uniform Memory Organization),
rather than NUMA. It's not worth headaching over such small differences
in latency, is basically what they're saying.


See my reply to Rob Stow.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
  #17  
Old December 6th 04, 10:17 AM
Per Ekman
external usenet poster
 
Posts: n/a
Default

Yousuf Khan writes:

George Macdonald wrote:
Hmm and the following quote: "However, the latency difference between local
and remote accesses is actually very small because the memory controller is
integrated into and operates at the core speed of the processor, and
because of the fast interconnect between processors." is relevant to
another discussion here. I wish we could get a firm answer on this one.


Yeah, but that's why I think AMD insists on calling their multiprocessor
connection scheme as SUMO (Sufficiently Uniform Memory Organization),
rather than NUMA. It's not worth headaching over such small differences
in latency, is basically what they're saying.


It's a bit of a crap argument isn't it? Even if the latency is small,
the fact that it's a NUMA system impacts performance (potentially by a
lot) as the available memory bandwidth is coupled to where you place
your data.

Classic example is OpenMP parallelized STREAM. Parallelize all the
loops except the data initialization loop on a system with hard memory
affinity (such as Linux), then parallelize _all_ the loops and explain
how the difference is "not worth headaching over".

Bottom line IMO is that pretending that the system isn't NUMA is doing
customers a disservice. They should know that treating the system as a
UMA one is a bad idea.

*p

  #20  
Old December 6th 04, 11:53 AM
Tony Hill
external usenet poster
 
Posts: n/a
Default

On 06 Dec 2004 11:17:19 +0100, Per Ekman wrote:

Yousuf Khan writes:

Yeah, but that's why I think AMD insists on calling their multiprocessor
connection scheme as SUMO (Sufficiently Uniform Memory Organization),
rather than NUMA. It's not worth headaching over such small differences
in latency, is basically what they're saying.


It's a bit of a crap argument isn't it? Even if the latency is small,
the fact that it's a NUMA system impacts performance (potentially by a
lot) as the available memory bandwidth is coupled to where you place
your data.


It does, but the difference is small, usually less than 10% and often
much closer to 0%. When well over 90% of your memory access is coming
from cache anyway and (assuming a totally random distribution in a
strictly UMA setup) 50% of your memory access is going to be local,
most of the performance difference is lost in the noise.

Besides, remember that even in a classic UMA environment (ie a 2P or
4P Xeon server... or even a single-processor system) you STILL have
differences in latency depending on where in memory your data resides
due to open vs. closed pages, TLB misses, etc.

Classic example is OpenMP parallelized STREAM. Parallelize all the
loops except the data initialization loop on a system with hard memory
affinity (such as Linux), then parallelize _all_ the loops and explain
how the difference is "not worth headaching over".


Most users don't use their computer to run STREAM though. Even in the
HPC community where memory bandwidth is king, STREAM is still a rather
extreme case.

Bottom line IMO is that pretending that the system isn't NUMA is doing
customers a disservice.


I've said it before and I'll say it again: Hardware is cheap,
software is expensive. It would be a true disservice to your
customers to tell them to spend thousands upon thousands of dollars
changing all their software for the small improvement in performance
equal to a few hundred dollars of hardware costs.

They should know that treating the system as a
UMA one is a bad idea.


Spending lots of money to make all your software NUMA is a bad idea
when treating it as UMA and throwing a tiny amount of extra hardware
at the job will do the trick. That's all that AMD is getting at.

Besides, they do recognize that it is NUMA, just that they are saying
you don't NEED to worry about that if you don't want to because for
the vast majority of times the performance difference is lost in the
noise.

-------------
Tony Hill
hilla underscore 20 at yahoo dot ca
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
· · · Have You Heard The Good News? · · · [email protected] General 0 January 29th 05 07:59 PM
(",) Good News for Google Groups, Usenet and Other Users [email protected] General 0 January 29th 05 06:30 AM
(",) Good News for Google Groups, Usenet and Other Users [email protected] General 0 January 29th 05 03:14 AM
(",) Good News for Google Groups, Usenet and Other Users [email protected] General 0 January 28th 05 04:13 PM
Anyone know a good deal on a good mp3 player? travel General 1 November 30th 04 10:51 PM


All times are GMT +1. The time now is 10:19 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.