A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Video Cards » Nvidia Videocards
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

An idea how to speed up computer programs and avoid waiting. ("event driven memory system")



 
 
Thread Tools Display Modes
  #1  
Old July 22nd 11, 03:23 PM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Skybuck Flying[_7_]
external usenet poster
 
Posts: 460
Default An idea how to speed up computer programs and avoid waiting. ("event driven memory system")

Hmm,

One possible problem with this idea is the following:

load
load
load
load * hit
load
load
load

some other instruction
some other instruction
some other instruction
some other instruction * proceed
some other instruction
some other instruction

The * hit indicates that it's memory has arrived. Now the problem is all the
other instructions. With this sequential programming pattern it would need
to skip over all other instructions to finally arrive at * proceed.

That's a lot of wastefull skipping/no operations.

One possible solution could be to allow the programmer to specify which
instruction to execute next

load register 4 proceed some other instruction 4.

But this is probably pushing it a bit

So maybe it is better to split up the program into seperate pieces, that way
the programmer only has to program one piece/kernel like in cuda.

Which automatically gets duplicated/repeated and so forth.

But then the problem is: even single threads stall.

The idea is to let the load instruction continue even while no data present
to hopefully execution something else.

So all those loads and some other instructions could also be replaced by
some calculations or so...

So perhaps some merit in this idea

Bye,
Skybuck.

  #2  
Old July 22nd 11, 03:25 PM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Skybuck Flying[_7_]
external usenet poster
 
Posts: 460
Default An idea how to speed up computer programs and avoid waiting. ("event driven memory system")

Perhaps even a new branching instruction like so, pseudo code idea:

if load then
begin
perform operation on loaded data
end else
begin
do something else while loading.
end;

The if branch would execute if the load completed.

The else branch would execute if the load is still pending.

Bye,
Skybuck.
  #3  
Old July 22nd 11, 06:03 PM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Skybuck Flying[_7_]
external usenet poster
 
Posts: 460
Default An idea how to speed up computer programs and avoid waiting. ("event driven memory system")

I am not totally happy with this posting, but gonna post it anyway,
otherwise it would probably be lost in oblivion

(At least now it's in my outlook express archive potential issue's could
be with correct program order/operations I++; I=I+SomeField; I++ - may not
execute until somefield is retrieved.)

Hmm,

One possible problem with this idea is the following:

load
load
load
load * hit
load
load
load

some other instruction
some other instruction
some other instruction
some other instruction * proceed
some other instruction
some other instruction

This idea isn't even that bad... instead of letting the programmer specify
the relations between the instructions, the compiler could also figure this
out.

The compiler for example could figure out the "some other instruction *"
relies on "load * hit" and therefore the compiler can fill in the "next
instruction pointer" inside the load instruction.

One little problem is introduced with this idea which would be nice to solve
as well:

The sequential order of the code should be respected.

Another bit could be used inside the load instruction or perhaps somewhere
in data to indicate if the next instruction is already executed or not.

If it was not yet executed it would first be executed... it could continue
doing this until all instructions are executed and " * proceed" is
encountered naturally. Or it could keep track of the next "jump to on stall
pointer".

The processor would store the "next instruction pointer" located at "load *
hit" into the "jump to on stall pointer register".

Then when the processor is about to stall/wait it won't let that happen and
instead check the "jump to on stall pointer register" if it's filled and/or
a flag is set to indicate that it has not been jumped to... then the
processor can jump to that instruction to execute there next...

This does screw up the instruction flow somewhat.

But a whole new kind of compiler and a whole new kind of instructions and
instruction flow paradigm could be created where each instruction indicates
via flags or registers or something on which other instructions it depends.

It doesn't need to specify all. It should only specify the minimum
requirement... this could also include data fields.

Other instructions could reset these flags to indicate that these fields are
to be reloaded, like load instructions.

Bye,
Skybuck.

  #4  
Old July 23rd 11, 10:09 AM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Tony Harding[_5_]
external usenet poster
 
Posts: 105
Default An idea how to speed up computer programs and avoid waiting.("event driven memory system")

On 07/22/11 10:23, Skybuck Flying wrote:
Hmm,

One possible problem with this idea is the following:

load
load
load
load * hit
load
load
load

some other instruction
some other instruction
some other instruction
some other instruction * proceed
some other instruction
some other instruction

The * hit indicates that it's memory has arrived. Now the problem is all
the other instructions. With this sequential programming pattern it
would need to skip over all other instructions to finally arrive at *
proceed.

That's a lot of wastefull skipping/no operations.

One possible solution could be to allow the programmer to specify which
instruction to execute next


IBM's 650 computer used this design, i.e., an instruction contained the
address of the next instruction for execution, which lead to an early
assembler (for want of a better term), called SOAP (Symbolic Optimizing
Assembly Programming). The optimizing here was for the machine's
magnetic drum memory (pre core days), like optimizing disk access.

Such a design does not conform to von Neumann's model, in which the next
instruction to be executed is the next sequential instruction unless
*something* causes it to be otherwise, e.g., a branch or jump instruction.

But who knows, maybe it's an idea whose time has come back?
  #5  
Old July 23rd 11, 02:49 PM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Quadibloc
external usenet poster
 
Posts: 46
Default An idea how to speed up computer programs and avoid waiting.("event driven memory system")

On Jul 23, 3:09*am, Tony Harding wrote:

IBM's 650 computer used this design, i.e., an instruction contained the
address of the next instruction for execution, which lead to an early
assembler (for want of a better term), called SOAP (Symbolic Optimizing
Assembly Programming). The optimizing here was for the machine's
magnetic drum memory (pre core days), like optimizing disk access.

Such a design does not conform to von Neumann's model, in which the next
instruction to be executed is the next sequential instruction unless
*something* causes it to be otherwise, e.g., a branch or jump instruction..

But who knows, maybe it's an idea whose time has come back?


The difference between how drum memory computers worked and a von
Neumann computer with random-access memory is generally considered to
be a trivial one which doesn't impact the programming model.

John Savard
  #6  
Old July 26th 11, 05:22 AM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Skybuck Flying[_7_]
external usenet poster
 
Posts: 460
Default An idea how to speed up computer programs and avoid waiting. ("event driven memory system")

How many "pipes" does AMD X2 3800+ processor have per core ?

And can you write an example which does more than just one load ?

Bye,
Skybuck.

  #7  
Old July 26th 11, 05:23 AM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Skybuck Flying[_7_]
external usenet poster
 
Posts: 460
Default An idea how to speed up computer programs and avoid waiting. ("event driven memory system")

An example which does one load per pipe would be nice !

But perhaps it's not possible ?!

Bye,
Skybuck.

  #8  
Old July 30th 11, 11:30 AM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Bernhard Schornak
external usenet poster
 
Posts: 17
Default An idea how to speed up computer programs and avoid waiting.("event driven memory system")

Skybuck Flying wrote:


Answer for the question in your other post:

The Athlon architecture has three integer as well
as three floating point execution pipes per core.
The next generation (Bulldozer/Zambezi) will have
four integer pipes per core. A pair of cores will
share two 128 bit floating point pipes, which can
be connected to one 256 bit pipe for YMM- or AVX-
instructions. (AMD introduced APU as new merasure
of everthing; FP calculations are done by the GPU
part of the combined processor, so no separate FP
unit is required any longer.)


An example which does one load per pipe would be nice !



....
mov eax,[mem] \
mov ebx,[mem + 0x04] cycle 1
mov ecx,[mem + 0x08] /
nop \
nop cycle 2
nop /
nop \
nop cycle 3
nop /
eax present \
nop cycle 4
nop /
ebx present \
nop cycle 5
nop /
ecx present \
nop cycle 6
nop /
....


It takes 3 clocks to load EAX - Athlons can fetch
32 bit (or 64 bit in long mode) per cycle. Hence,
the new content of EAX will be available in cycle
four, while the other two loads still are in pro-
gress. Repeat this with EBX and ECX - NOPs should
be replaced by some instructions not depending on
the new content of EAX, EBX or ECX. It is the the
programmer's job to schedule instructions wisely.
Unfortunately, an overwhelming majority of coders
does not know what is going on inside the machine
they write code for (= "machine independent").


But perhaps it's not possible ?!



It is, but you cannot read more than 32 or 64 bit
from memory per clock cycle. If I remember right,
LETNi processors have similar limitations (due to
the design of the underlying memory subsystem and
its interface to the processor).

Writes to memory can gain some speed via a mecha-
nism called "write combining" - if you write more
than a dword (qword for 64 bit mode) to ascending
memory locations, the processor collects them and
writes the corresponding cache line (64 bytes for
Athlons) back to memory in one gulp. This snippet


....
xor eax,eax
xor ebx,ebx
xor ecx,ecx
mov [mem+0x00],eax
mov [mem+0x08],ebx
mov [mem+0x10],ecx
mov [mem+0x18],eax
mov [mem+0x20],ebx
mov [mem+0x28],ecx
mov [mem+0x30],eax
mov [mem+0x38],ebx
....


clears 64 byte in memory in four (rather than 10)
clock cycles, using the burst capabilities of the
attached memory controller. However, write access
always is faster than read access (no dependency,
registers can be used immediately after a write).


Greetings from Augsburg

Bernhard Schornak
  #9  
Old August 1st 11, 11:34 AM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Skybuck Flying[_7_]
external usenet poster
 
Posts: 460
Default An idea how to speed up computer programs and avoid waiting. ("event driven memory system")

Interesting theory, question is if it can be put to practice.

A problem might be that the other registers might already be in use.

Here is my Delphi/Pascal test code to test CPU random access memory (cpu
cache) performance.

(I already optimized it to use integers only instead of "dynamic indexes"
this already makes it twice as fast as the cuda test program):

Below I shall post the Delphi/Pascal Code and below that I shall post the
generated assembler code to show what Delphi compiler produces/makes of it
and perhaps it might be of some help to you to see what's going on and where
potential problems might be and maybe as inspiration to try and do a better
job at it maybe you can solve it, I shall also try to write an assembler
routine myself to see if this trick of yours can actually work in practice,
you are welcome to try as well

// *** Begin of Delphi/Pascal Code ***

// version 0.02: try multiple pipe trick to see if delphi compiler can use
it.
procedure TCPUMemoryTest.ExecuteCPU;
var
vStart : int64;
vStop : int64;
vFrequency : int64;

vBlockIndex : integer;
vLoopIndex : integer;

vElementIndexA : integer;
vElementIndexB : integer;
vElementIndexC : integer;

vElementCount : integer;
begin
QueryPerformanceCounter( vStart );

vElementCount := mElementCount;
for vBlockIndex := 0 to (mBlockCount div 3) do
begin
vElementIndexA := 0;
vElementIndexB := 0;
vElementIndexC := 0;

for vLoopIndex := 0 to mLoopCount-1 do
begin
vElementIndexA := mMemory[ vElementIndexA + (vBlockIndex*3+0) *
vElementCount ];
vElementIndexB := mMemory[ vElementIndexB + (vBlockIndex*3+1) *
vElementCount ];
vElementIndexC := mMemory[ vElementIndexC + (vBlockIndex*3+2) *
vElementCount ];
end;

mBlockResult[ vBlockIndex*3+0 ] := vElementIndexA;
mBlockResult[ vBlockIndex*3+1 ] := vElementIndexB;
mBlockResult[ vBlockIndex*3+2 ] := vElementIndexC;
end;

QueryPerformanceCounter( vStop );
QueryPerformanceFrequency( vFrequency );

mCPUExecutionTimeInSeconds := (vStop - vStart) / vFrequency;
end;

// *** End of Delphi/Pascal Code ***

So far from what I can tell from the assembler output, Delphi does not seem
to apply the "don't use register until much later trick".

It seems to introduce "register depedencies" which probably makes everything
stall.

This was just an early version/try so perhaps a hand-written assembler
routine would perform better.

// *** Begin of assembler output ***

unit_TCPUMemoryTest_version_001.pas.184: begin
0040FC5C 53 push ebx
0040FC5D 56 push esi
0040FC5E 57 push edi
0040FC5F 55 push ebp
0040FC60 83C4D0 add esp,-$30
0040FC63 8BD8 mov ebx,eax
unit_TCPUMemoryTest_version_001.pas.185: QueryPerformanceCounter( vStart );
0040FC65 54 push esp
0040FC66 E82195FFFF call QueryPerformanceCounter
unit_TCPUMemoryTest_version_001.pas.187: vElementCount := mElementCount;
0040FC6B 8B4B0C mov ecx,[ebx+$0c]
unit_TCPUMemoryTest_version_001.pas.188: for vBlockIndex := 0 to
(mBlockCount div 3) do
0040FC6E 8B4310 mov eax,[ebx+$10]
0040FC71 BE03000000 mov esi,$00000003
0040FC76 99 cdq
0040FC77 F7FE idiv esi
0040FC79 85C0 test eax,eax
0040FC7B 0F8C87000000 jl $0040fd08
0040FC81 40 inc eax
0040FC82 89442420 mov [esp+$20],eax
0040FC86 33C0 xor eax,eax
unit_TCPUMemoryTest_version_001.pas.190: vElementIndexA := 0;
0040FC88 33F6 xor esi,esi
unit_TCPUMemoryTest_version_001.pas.191: vElementIndexB := 0;
0040FC8A 33D2 xor edx,edx
0040FC8C 89542418 mov [esp+$18],edx
unit_TCPUMemoryTest_version_001.pas.192: vElementIndexC := 0;
0040FC90 33D2 xor edx,edx
0040FC92 8954241C mov [esp+$1c],edx
unit_TCPUMemoryTest_version_001.pas.194: for vLoopIndex := 0 to mLoopCount-1
do
0040FC96 8B5314 mov edx,[ebx+$14]
0040FC99 4A dec edx
0040FC9A 85D2 test edx,edx
0040FC9C 7C44 jl $0040fce2
0040FC9E 42 inc edx
0040FC9F 89542424 mov [esp+$24],edx
unit_TCPUMemoryTest_version_001.pas.196: vElementIndexA := mMemory[
vElementIndexA + (vBlockIndex*3+0) * vElementCount ];
0040FCA3 8D1440 lea edx,[eax+eax*2]
0040FCA6 8BFA mov edi,edx
0040FCA8 0FAFF9 imul edi,ecx
0040FCAB 03F7 add esi,edi
0040FCAD 8B7B04 mov edi,[ebx+$04]
0040FCB0 8B34B7 mov esi,[edi+esi*4]
unit_TCPUMemoryTest_version_001.pas.197: vElementIndexB := mMemory[
vElementIndexB + (vBlockIndex*3+1) * vElementCount ];
0040FCB3 8BFA mov edi,edx
0040FCB5 47 inc edi
0040FCB6 0FAFF9 imul edi,ecx
0040FCB9 037C2418 add edi,[esp+$18]
0040FCBD 8B6B04 mov ebp,[ebx+$04]
0040FCC0 8B7CBD00 mov edi,[ebp+edi*4+$00]
0040FCC4 897C2418 mov [esp+$18],edi
unit_TCPUMemoryTest_version_001.pas.198: vElementIndexC := mMemory[
vElementIndexC + (vBlockIndex*3+2) * vElementCount ];
0040FCC8 83C202 add edx,$02
0040FCCB 0FAFD1 imul edx,ecx
0040FCCE 0354241C add edx,[esp+$1c]
0040FCD2 8B7B04 mov edi,[ebx+$04]
0040FCD5 8B1497 mov edx,[edi+edx*4]
0040FCD8 8954241C mov [esp+$1c],edx
unit_TCPUMemoryTest_version_001.pas.194: for vLoopIndex := 0 to mLoopCount-1
do
0040FCDC FF4C2424 dec dword ptr [esp+$24]
0040FCE0 75C1 jnz $0040fca3
unit_TCPUMemoryTest_version_001.pas.201: mBlockResult[ vBlockIndex*3+0 ] :=
vElementIndexA;
0040FCE2 8D1440 lea edx,[eax+eax*2]
0040FCE5 8B7B08 mov edi,[ebx+$08]
0040FCE8 893497 mov [edi+edx*4],esi
unit_TCPUMemoryTest_version_001.pas.202: mBlockResult[ vBlockIndex*3+1 ] :=
vElementIndexB;
0040FCEB 8B7308 mov esi,[ebx+$08]
0040FCEE 8B7C2418 mov edi,[esp+$18]
0040FCF2 897C9604 mov [esi+edx*4+$04],edi
unit_TCPUMemoryTest_version_001.pas.203: mBlockResult[ vBlockIndex*3+2 ] :=
vElementIndexC;
0040FCF6 8B7308 mov esi,[ebx+$08]
0040FCF9 8B7C241C mov edi,[esp+$1c]
0040FCFD 897C9608 mov [esi+edx*4+$08],edi
unit_TCPUMemoryTest_version_001.pas.204: end;
0040FD01 40 inc eax
unit_TCPUMemoryTest_version_001.pas.188: for vBlockIndex := 0 to
(mBlockCount div 3) do
0040FD02 FF4C2420 dec dword ptr [esp+$20]
0040FD06 7580 jnz $0040fc88
unit_TCPUMemoryTest_version_001.pas.206: QueryPerformanceCounter( vStop );
0040FD08 8D442408 lea eax,[esp+$08]
0040FD0C 50 push eax
0040FD0D E87A94FFFF call QueryPerformanceCounter
unit_TCPUMemoryTest_version_001.pas.207: QueryPerformanceFrequency(
vFrequency );
0040FD12 8D442410 lea eax,[esp+$10]
0040FD16 50 push eax
0040FD17 E87894FFFF call QueryPerformanceFrequency
unit_TCPUMemoryTest_version_001.pas.209: mCPUExecutionTimeInSeconds :=
(vStop - vStart) / vFrequency;
0040FD1C 8B442408 mov eax,[esp+$08]
0040FD20 8B54240C mov edx,[esp+$0c]
0040FD24 2B0424 sub eax,[esp]
0040FD27 1B542404 sbb edx,[esp+$04]
0040FD2B 89442428 mov [esp+$28],eax
0040FD2F 8954242C mov [esp+$2c],edx
0040FD33 DF6C2428 fild qword ptr [esp+$28]
0040FD37 DF6C2410 fild qword ptr [esp+$10]
0040FD3B DEF9 fdivp st(1)
0040FD3D DD5B28 fstp qword ptr [ebx+$28]
0040FD40 9B wait
unit_TCPUMemoryTest_version_001.pas.210: end;
0040FD41 83C430 add esp,$30
0040FD44 5D pop ebp
0040FD45 5F pop edi
0040FD46 5E pop esi
0040FD47 5B pop ebx
0040FD48 C3 ret

// *** End of assembler output ***

Bye,
Skybuck.

  #10  
Old August 1st 11, 11:41 AM posted to alt.comp.lang.borland-delphi,alt.comp.periphs.videocards.nvidia,alt.lang.asm,comp.arch,rec.games.corewar
Skybuck Flying[_7_]
external usenet poster
 
Posts: 460
Default An idea how to speed up computer programs and avoid waiting. ("event driven memory system")

* little correction included.

Interesting theory, question is if it can be put to practice.

A problem might be that the other registers might already be in use.

Here is my Delphi/Pascal test code to test CPU random access memory (cpu
cache) performance.

(I already optimized it to use integers only instead of "dynamic indexes"
this already makes it twice as fast as the cuda test program):

Below I shall post the Delphi/Pascal Code and below that I shall post the
generated assembler code to show what Delphi compiler produces/makes of it
and perhaps it might be of some help to you to see what's going on and where
potential problems might be and maybe as inspiration to try and do a better
job at it maybe you can solve it, I shall also try to write an assembler
routine myself to see if this trick of yours can actually work in practice,
you are welcome to try as well

// *** Begin of Delphi/Pascal Code ***

// version 0.02: try multiple pipe trick to see if delphi compiler can use
it.
procedure TCPUMemoryTest.ExecuteCPU;
var
vStart : int64;
vStop : int64;
vFrequency : int64;

vBlockIndex : integer;
vLoopIndex : integer;

vElementIndexA : integer;
vElementIndexB : integer;
vElementIndexC : integer;

vElementCount : integer;
begin
QueryPerformanceCounter( vStart );

vElementCount := mElementCount;
// for vBlockIndex := 0 to (mBlockCount div 3) do
// * little correction:
for vBlockIndex := 0 to (mBlockCount div 3)-1 do


begin
vElementIndexA := 0;
vElementIndexB := 0;
vElementIndexC := 0;

for vLoopIndex := 0 to mLoopCount-1 do
begin
vElementIndexA := mMemory[ vElementIndexA + (vBlockIndex*3+0) *
vElementCount ];
vElementIndexB := mMemory[ vElementIndexB + (vBlockIndex*3+1) *
vElementCount ];
vElementIndexC := mMemory[ vElementIndexC + (vBlockIndex*3+2) *
vElementCount ];
end;

mBlockResult[ vBlockIndex*3+0 ] := vElementIndexA;
mBlockResult[ vBlockIndex*3+1 ] := vElementIndexB;
mBlockResult[ vBlockIndex*3+2 ] := vElementIndexC;
end;

QueryPerformanceCounter( vStop );
QueryPerformanceFrequency( vFrequency );

mCPUExecutionTimeInSeconds := (vStop - vStart) / vFrequency;
end;

// *** End of Delphi/Pascal Code ***

So far from what I can tell from the assembler output, Delphi does not seem
to apply the "don't use register until much later trick".

It seems to introduce "register depedencies" which probably makes everything
stall.

This was just an early version/try so perhaps a hand-written assembler
routine would perform better.

// *** Begin of assembler output ***

unit_TCPUMemoryTest_version_001.pas.184: begin
0040FC5C 53 push ebx
0040FC5D 56 push esi
0040FC5E 57 push edi
0040FC5F 55 push ebp
0040FC60 83C4D0 add esp,-$30
0040FC63 8BD8 mov ebx,eax
unit_TCPUMemoryTest_version_001.pas.185: QueryPerformanceCounter( vStart );
0040FC65 54 push esp
0040FC66 E82195FFFF call QueryPerformanceCounter
unit_TCPUMemoryTest_version_001.pas.187: vElementCount := mElementCount;
0040FC6B 8B4B0C mov ecx,[ebx+$0c]
unit_TCPUMemoryTest_version_001.pas.188: for vBlockIndex := 0 to
(mBlockCount div 3) do
0040FC6E 8B4310 mov eax,[ebx+$10]
0040FC71 BE03000000 mov esi,$00000003
0040FC76 99 cdq
0040FC77 F7FE idiv esi
0040FC79 85C0 test eax,eax
0040FC7B 0F8C87000000 jl $0040fd08
0040FC81 40 inc eax
0040FC82 89442420 mov [esp+$20],eax
0040FC86 33C0 xor eax,eax
unit_TCPUMemoryTest_version_001.pas.190: vElementIndexA := 0;
0040FC88 33F6 xor esi,esi
unit_TCPUMemoryTest_version_001.pas.191: vElementIndexB := 0;
0040FC8A 33D2 xor edx,edx
0040FC8C 89542418 mov [esp+$18],edx
unit_TCPUMemoryTest_version_001.pas.192: vElementIndexC := 0;
0040FC90 33D2 xor edx,edx
0040FC92 8954241C mov [esp+$1c],edx
unit_TCPUMemoryTest_version_001.pas.194: for vLoopIndex := 0 to mLoopCount-1
do
0040FC96 8B5314 mov edx,[ebx+$14]
0040FC99 4A dec edx
0040FC9A 85D2 test edx,edx
0040FC9C 7C44 jl $0040fce2
0040FC9E 42 inc edx
0040FC9F 89542424 mov [esp+$24],edx
unit_TCPUMemoryTest_version_001.pas.196: vElementIndexA := mMemory[
vElementIndexA + (vBlockIndex*3+0) * vElementCount ];
0040FCA3 8D1440 lea edx,[eax+eax*2]
0040FCA6 8BFA mov edi,edx
0040FCA8 0FAFF9 imul edi,ecx
0040FCAB 03F7 add esi,edi
0040FCAD 8B7B04 mov edi,[ebx+$04]
0040FCB0 8B34B7 mov esi,[edi+esi*4]
unit_TCPUMemoryTest_version_001.pas.197: vElementIndexB := mMemory[
vElementIndexB + (vBlockIndex*3+1) * vElementCount ];
0040FCB3 8BFA mov edi,edx
0040FCB5 47 inc edi
0040FCB6 0FAFF9 imul edi,ecx
0040FCB9 037C2418 add edi,[esp+$18]
0040FCBD 8B6B04 mov ebp,[ebx+$04]
0040FCC0 8B7CBD00 mov edi,[ebp+edi*4+$00]
0040FCC4 897C2418 mov [esp+$18],edi
unit_TCPUMemoryTest_version_001.pas.198: vElementIndexC := mMemory[
vElementIndexC + (vBlockIndex*3+2) * vElementCount ];
0040FCC8 83C202 add edx,$02
0040FCCB 0FAFD1 imul edx,ecx
0040FCCE 0354241C add edx,[esp+$1c]
0040FCD2 8B7B04 mov edi,[ebx+$04]
0040FCD5 8B1497 mov edx,[edi+edx*4]
0040FCD8 8954241C mov [esp+$1c],edx
unit_TCPUMemoryTest_version_001.pas.194: for vLoopIndex := 0 to mLoopCount-1
do
0040FCDC FF4C2424 dec dword ptr [esp+$24]
0040FCE0 75C1 jnz $0040fca3
unit_TCPUMemoryTest_version_001.pas.201: mBlockResult[ vBlockIndex*3+0 ] :=
vElementIndexA;
0040FCE2 8D1440 lea edx,[eax+eax*2]
0040FCE5 8B7B08 mov edi,[ebx+$08]
0040FCE8 893497 mov [edi+edx*4],esi
unit_TCPUMemoryTest_version_001.pas.202: mBlockResult[ vBlockIndex*3+1 ] :=
vElementIndexB;
0040FCEB 8B7308 mov esi,[ebx+$08]
0040FCEE 8B7C2418 mov edi,[esp+$18]
0040FCF2 897C9604 mov [esi+edx*4+$04],edi
unit_TCPUMemoryTest_version_001.pas.203: mBlockResult[ vBlockIndex*3+2 ] :=
vElementIndexC;
0040FCF6 8B7308 mov esi,[ebx+$08]
0040FCF9 8B7C241C mov edi,[esp+$1c]
0040FCFD 897C9608 mov [esi+edx*4+$08],edi
unit_TCPUMemoryTest_version_001.pas.204: end;
0040FD01 40 inc eax
unit_TCPUMemoryTest_version_001.pas.188: for vBlockIndex := 0 to
(mBlockCount div 3) do
0040FD02 FF4C2420 dec dword ptr [esp+$20]
0040FD06 7580 jnz $0040fc88
unit_TCPUMemoryTest_version_001.pas.206: QueryPerformanceCounter( vStop );
0040FD08 8D442408 lea eax,[esp+$08]
0040FD0C 50 push eax
0040FD0D E87A94FFFF call QueryPerformanceCounter
unit_TCPUMemoryTest_version_001.pas.207: QueryPerformanceFrequency(
vFrequency );
0040FD12 8D442410 lea eax,[esp+$10]
0040FD16 50 push eax
0040FD17 E87894FFFF call QueryPerformanceFrequency
unit_TCPUMemoryTest_version_001.pas.209: mCPUExecutionTimeInSeconds :=
(vStop - vStart) / vFrequency;
0040FD1C 8B442408 mov eax,[esp+$08]
0040FD20 8B54240C mov edx,[esp+$0c]
0040FD24 2B0424 sub eax,[esp]
0040FD27 1B542404 sbb edx,[esp+$04]
0040FD2B 89442428 mov [esp+$28],eax
0040FD2F 8954242C mov [esp+$2c],edx
0040FD33 DF6C2428 fild qword ptr [esp+$28]
0040FD37 DF6C2410 fild qword ptr [esp+$10]
0040FD3B DEF9 fdivp st(1)
0040FD3D DD5B28 fstp qword ptr [ebx+$28]
0040FD40 9B wait
unit_TCPUMemoryTest_version_001.pas.210: end;
0040FD41 83C430 add esp,$30
0040FD44 5D pop ebp
0040FD45 5F pop edi
0040FD46 5E pop esi
0040FD47 5B pop ebx
0040FD48 C3 ret

// *** End of assembler output ***

Bye,
Skybuck.

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
An idea how to speed up computer programs and avoid waiting. ("event driven memory system") Skybuck Flying[_7_] Nvidia Videocards 22 August 15th 11 03:14 AM
Dimension 8400 w/intel 670 3.8gig processor "Thermal Event" Brad[_3_] Dell Computers 44 April 23rd 11 11:09 PM
Can't "unsync" memory bus speed (A8V-E SE) Hackworth Asus Motherboards 2 September 6th 06 05:28 AM
P5WD2-E system "hang" after memory size [email protected] Asus Motherboards 12 July 8th 06 11:24 PM


All times are GMT +1. The time now is 11:38 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.