X2 vs X4

**Richard P** · September 23rd 08, 06:07 PM posted to alt.comp.hardware.amd.x86-64

Dave Feustel wrote:
DevilsPGD wrote:
In message Dave Feustel
wrote:

The person who told me this is Miles R***, a person who sells computers
for a living.
"Never trust someone trying to sell you something" comes to mind.

If the cores ran at the chip's nominal clock speed, a
four-core chip would perform 4 times faster than a single core chip at
the same clock speed, which they don't.
Depending on your task, a four-core CPU can perform reasonably close to
four times the clock speed of a single core CPU. Unfortunately, few
tasks parrallelize that well, and even less software takes full
advantage of modern CPUs.

That being said, aside from some shady marketing in the past advertising
dual CPU systems as double the clock speed of one CPU rather then
advertising the actual configuration, each core runs at the full clock
speed advertised.

So the 4 core chip cpu should run 4 independent identical tasks (compute
pi to 1 million digits) in essentially the same time that a single core
runs one instance of that task?

Yes

**DevilsPGD[_2_]** · September 23rd 08, 08:39 PM posted to alt.comp.hardware.amd.x86-64

In message Dave Feustel
wrote:

DevilsPGD wrote:
Depending on your task, a four-core CPU can perform reasonably close to
four times the clock speed of a single core CPU. Unfortunately, few
tasks parrallelize that well, and even less software takes full
advantage of modern CPUs.

That being said, aside from some shady marketing in the past advertising
dual CPU systems as double the clock speed of one CPU rather then
advertising the actual configuration, each core runs at the full clock
speed advertised.

So the 4 core chip cpu should run 4 independent identical tasks (compute
pi to 1 million digits) in essentially the same time that a single core
runs one instance of that task?

More or less, yes. However, in the real world, not all tasks will scale
quite this well as many tasks require not only CPU resources, but also
other resources which may become starved before you load all four cores.

For something that can be done entirely on-chip, you'll get four times
the performance using all four cores of a quad 2.4GHz CPU then a single
core version of the same 2.4GHz CPU.

**Zootal** · September 24th 08, 10:06 PM posted to alt.comp.hardware.amd.x86-64

"Scott Lurndal" wrote in message
...
Jim Beard writes:

Specifically with respect to X2 vs X4, the kernel scheduler will do a
fairly good job of using two CPUs, but rarely does well with more
than two unless the applications are specifically tailored for

maybe with respect to windows, but linux schedulers are O(1) over
large numbers of cores.

scheduler overhead is pretty much non-existent.

scott

Are you sure about that? Each cpu has its own set of runqueues. If I have 4
cpus, I have 4 sets of runqueues to manage, and 4 sets of runqueues to
search. The runqueue itself can be searched for the next entry in O(1)
time - this is where the O(1) comes from, because the amount of time it
takes to find the next task in the queue is constant and not dependant by
the number of tasks in the queue.

I would think that that the default linux scheduler is O(n) over large
number of cores, where n = the number of cores.

**Scott Lurndal** · September 24th 08, 11:12 PM posted to alt.comp.hardware.amd.x86-64

"Zootal" writes:

"Scott Lurndal" wrote in message
...
Jim Beard writes:

Specifically with respect to X2 vs X4, the kernel scheduler will do a
fairly good job of using two CPUs, but rarely does well with more
than two unless the applications are specifically tailored for

maybe with respect to windows, but linux schedulers are O(1) over
large numbers of cores.

scheduler overhead is pretty much non-existent.

scott

Are you sure about that? Each cpu has its own set of runqueues. If I have 4
cpus, I have 4 sets of runqueues to manage, and 4 sets of runqueues to
search. The runqueue itself can be searched for the next entry in O(1)
time - this is where the O(1) comes from, because the amount of time it
takes to find the next task in the queue is constant and not dependant by
the number of tasks in the queue.

I would think that that the default linux scheduler is O(n) over large
number of cores, where n = the number of cores.

If you have a runqueue per core, then you simply schedule the next
entry in the queue for each core. O(1). Remember that code is shared by all
processors, and scheduling happens in-context - there is not a
scheduler "thread" or "job" or "task" per se.

scott

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode