A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » AMD x86-64 Processors
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

"TLB parity error in virtual array; TLB error 'instruction"?



 
 
Thread Tools Display Modes
  #1  
Old March 7th 10, 05:59 PM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

Hello.

Lately, I have been random and rare kernel panics on my old Debian/Linux
box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure
out what it was until I discovered mcelog a couple days ago, and it
revealed interesting scary datas in my dmesg/messages and syslog:

# cat /var/log/messages
....
Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events
logged
Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a
software problem!
Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor
Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache
Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0
Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010
Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array
Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction,
level 1'
Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43

I am not familiar with hardwares, so I assume this is very bad, but what
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had
it and its motherboard since 12/24/2006, so it is not that old yet. I
have the full details on my secondary machine at
http://alpha.zimage.com/~ant/antfarm.../computers.txt ...

Although, this might be related to the PSU's death back in early
December 2009. My friend and I believe it also took out my EVGA GeForce
8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each
piece with memtest86+ v4.00 to narrow it down).
http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the
details of my systems. I did run memtest86+ again a couple weeks ago and
this morning for 5-6 hours, but not got no errors after five full tests
(passed). I also do not overclock/OC.

Thank you in advance.
--
"Ever watch ants just crawling around? They walk in that single straight
line, a long, a long, long mile of ants. Sometimes they will walk over
and pick up their dead friends and carry those around. I'm pretty sure
it's because they can get in the carpool lane and pass up that line."
--Ellen DeGeneres
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #2  
Old March 7th 10, 08:59 PM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

I also ran sys_basher (http://www.polybus.com/sys_basher_web/) in my
Debian a few times in the past and just now. No errors or crashes.


On 3/7/2010 8:59 AM PT, Ant typed:

Hello.

Lately, I have been random and rare kernel panics on my old Debian/Linux
box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure
out what it was until I discovered mcelog a couple days ago, and it
revealed interesting scary datas in my dmesg/messages and syslog:

# cat /var/log/messages
...
Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events
logged
Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a
software problem!
Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor
Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache
Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0
Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010
Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array
Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction,
level 1'
Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43

I am not familiar with hardwares, so I assume this is very bad, but what
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had
it and its motherboard since 12/24/2006, so it is not that old yet. I
have the full details on my secondary machine at
http://alpha.zimage.com/~ant/antfarm.../computers.txt ...

Although, this might be related to the PSU's death back in early
December 2009. My friend and I believe it also took out my EVGA GeForce
8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each
piece with memtest86+ v4.00 to narrow it down).
http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the
details of my systems. I did run memtest86+ again a couple weeks ago and
this morning for 5-6 hours, but not got no errors after five full tests
(passed). I also do not overclock/OC.

Thank you in advance.

--
"Each of us needs to withdraw from the cares which will not withdraw
from us. We need hours of aimless wandering or spates of time sitting on
park benches, observing the mysterious world of ants and the canopy of
treetops." --Maya Angelou (b. 1928) American writer and entertainer
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #3  
Old March 8th 10, 07:22 AM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Paul
external usenet poster
 
Posts: 13,364
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote:
I also ran sys_basher (http://www.polybus.com/sys_basher_web/) in my
Debian a few times in the past and just now. No errors or crashes.


On 3/7/2010 8:59 AM PT, Ant typed:

Hello.

Lately, I have been random and rare kernel panics on my old Debian/Linux
box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure
out what it was until I discovered mcelog a couple days ago, and it
revealed interesting scary datas in my dmesg/messages and syslog:

# cat /var/log/messages
...
Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events
logged
Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a
software problem!
Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor
Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache
Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0
Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010
Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array
Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction,
level 1'
Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43

I am not familiar with hardwares, so I assume this is very bad, but what
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had
it and its motherboard since 12/24/2006, so it is not that old yet. I
have the full details on my secondary machine at
http://alpha.zimage.com/~ant/antfarm.../computers.txt ...

Although, this might be related to the PSU's death back in early
December 2009. My friend and I believe it also took out my EVGA GeForce
8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each
piece with memtest86+ v4.00 to narrow it down).
http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the
details of my systems. I did run memtest86+ again a couple weeks ago and
this morning for 5-6 hours, but not got no errors after five full tests
(passed). I also do not overclock/OC.

Thank you in advance.


http://en.wikipedia.org/wiki/Transla...okaside_buffer

TLB stands for Translation Lookaside Buffer.
It translates from virtual addresses to physical addresses.
And apparently, according to the AMD documentation, it is protected by parity.
It is part of the processor.

A question would be, if it was a real error, why weren't there crash
symptoms or side effects ? If an incorrect mapping from virtual space
to physical occurred, you'd think there would be consequences. (Maybe
the entry is automatically invalidated and reloaded via page table walk ?)

The AMD processor apparently has BIST or built-in self test, for memory
structures inside the processor. This document is not at all clear, on
whether you'd have that implemented on a typical desktop motherboard.
It is an optional operation, that would occur early after powerup.
It would allow bad internal memory inside the processor to be detected,
before a computer boots. There is a bit in a special register, that
contains the test result, if the test was triggered.

(Section 14.1.1 PDF page 395 "Programmers Manual Vol.2")
http://support.amd.com/us/Processor_TechDocs/24593.pdf

Paul
  #4  
Old March 8th 10, 08:47 AM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/7/2010 10:22 PM PT, Paul typed:

http://en.wikipedia.org/wiki/Transla...okaside_buffer

TLB stands for Translation Lookaside Buffer.
It translates from virtual addresses to physical addresses.
And apparently, according to the AMD documentation, it is protected by
parity.
It is part of the processor.

A question would be, if it was a real error, why weren't there crash
symptoms or side effects ? If an incorrect mapping from virtual space


Wouldn't kernel panics be one of them?


to physical occurred, you'd think there would be consequences. (Maybe
the entry is automatically invalidated and reloaded via page table walk ?)


[shrugs]


The AMD processor apparently has BIST or built-in self test, for memory
structures inside the processor. This document is not at all clear, on
whether you'd have that implemented on a typical desktop motherboard.
It is an optional operation, that would occur early after powerup.
It would allow bad internal memory inside the processor to be detected,
before a computer boots. There is a bit in a special register, that
contains the test result, if the test was triggered.

(Section 14.1.1 PDF page 395 "Programmers Manual Vol.2")
http://support.amd.com/us/Processor_TechDocs/24593.pdf


Now, this is over my head. Is there a way to test this with softwares?
Does memtest86+ v4.00 test for this? I already tried compiling, unraring
10+ GB of datas, running sys_basher, and memtest86+ v4.0 (passed a few
weeks ago + this morning = five tests total). It doesn't seem to
stress/overload and temperatures related since most kernel panics
happened when mostly idled!
--
"Ants can destroy a tree. Therefore this ant can destroy a tree."
--Logic & Fallacy
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #5  
Old March 8th 10, 09:30 AM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Paul
external usenet poster
 
Posts: 13,364
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote:


The AMD processor apparently has BIST or built-in self test, for memory
structures inside the processor. This document is not at all clear, on
whether you'd have that implemented on a typical desktop motherboard.
It is an optional operation, that would occur early after powerup.
It would allow bad internal memory inside the processor to be detected,
before a computer boots. There is a bit in a special register, that
contains the test result, if the test was triggered.

(Section 14.1.1 PDF page 395 "Programmers Manual Vol.2")
http://support.amd.com/us/Processor_TechDocs/24593.pdf


Now, this is over my head. Is there a way to test this with softwares?
Does memtest86+ v4.00 test for this? I already tried compiling, unraring
10+ GB of datas, running sys_basher, and memtest86+ v4.0 (passed a few
weeks ago + this morning = five tests total). It doesn't seem to
stress/overload and temperatures related since most kernel panics
happened when mostly idled!


That entry in the manual means, there is a way to test that section
of the processor. But I'm not aware of any software that does things
like that. And because the 24593 document didn't say what triggered the
test, I can't comment on whether a motivated person could even write
some code to do it. Maybe there are one or more pins on the processor,
that have to be set up for that. I could see a server motherboard maker
perhaps, going the extra mile (doing a basic test on the processor,
before completing POST).

The pinout for AM2 socket isn't publicly available. This site says
the document needed is 31117.pdf, but you can't download that. So
there is no way to look for any pins with "interesting" names.

http://www.sandpile.org/docs/amd/k8.htm

My guess is, that a program like memtest86+, isn't going to specifically
target things like the TLB, while it tests main memory. It's possible
a small number of entries in the TLB were loaded by the BIOS, for perhaps
a linear mapping of some sort, and memtest86+ relies on that for what it
does. You'd have to look at the source for memtest86+, to see what it
does. I read a claim a couple days ago, that memtest86+ uses PAE, and
that should be a mapping trick as well. That is how a 32 bit executable
can be used to test system memory totals of greater than 4GB. It could
test 4GB at a time, and change mappings to access a different 4GB block of
memory.

Paul
  #6  
Old March 8th 10, 05:21 PM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/8/2010 12:30 AM PT, Paul typed:

Ant wrote:


The AMD processor apparently has BIST or built-in self test, for memory
structures inside the processor. This document is not at all clear, on
whether you'd have that implemented on a typical desktop motherboard.
It is an optional operation, that would occur early after powerup.
It would allow bad internal memory inside the processor to be detected,
before a computer boots. There is a bit in a special register, that
contains the test result, if the test was triggered.

(Section 14.1.1 PDF page 395 "Programmers Manual Vol.2")
http://support.amd.com/us/Processor_TechDocs/24593.pdf


Now, this is over my head. Is there a way to test this with softwares?
Does memtest86+ v4.00 test for this? I already tried compiling,
unraring 10+ GB of datas, running sys_basher, and memtest86+ v4.0
(passed a few weeks ago + this morning = five tests total). It doesn't
seem to stress/overload and temperatures related since most kernel
panics happened when mostly idled!


That entry in the manual means, there is a way to test that section
of the processor. But I'm not aware of any software that does things
like that. And because the 24593 document didn't say what triggered the
test, I can't comment on whether a motivated person could even write
some code to do it. Maybe there are one or more pins on the processor,
that have to be set up for that. I could see a server motherboard maker
perhaps, going the extra mile (doing a basic test on the processor,
before completing POST).

The pinout for AM2 socket isn't publicly available. This site says
the document needed is 31117.pdf, but you can't download that. So
there is no way to look for any pins with "interesting" names.

http://www.sandpile.org/docs/amd/k8.htm

My guess is, that a program like memtest86+, isn't going to specifically
target things like the TLB, while it tests main memory. It's possible
a small number of entries in the TLB were loaded by the BIOS, for perhaps
a linear mapping of some sort, and memtest86+ relies on that for what it
does. You'd have to look at the source for memtest86+, to see what it
does. I read a claim a couple days ago, that memtest86+ uses PAE, and
that should be a mapping trick as well. That is how a 32 bit executable
can be used to test system memory totals of greater than 4GB. It could
test 4GB at a time, and change mappings to access a different 4GB block of
memory.


Ah, interesting. Thanks.
--
"The ants and termites have renounced the Hobbesian war." --Petr Kropotkin
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #7  
Old March 13th 10, 08:09 AM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/8/2010 8:21 AM PT, Ant typed:

The AMD processor apparently has BIST or built-in self test, for memory
structures inside the processor. This document is not at all clear, on
whether you'd have that implemented on a typical desktop motherboard.
It is an optional operation, that would occur early after powerup.
It would allow bad internal memory inside the processor to be detected,
before a computer boots. There is a bit in a special register, that
contains the test result, if the test was triggered.

(Section 14.1.1 PDF page 395 "Programmers Manual Vol.2")
http://support.amd.com/us/Processor_TechDocs/24593.pdf

Now, this is over my head. Is there a way to test this with softwares?
Does memtest86+ v4.00 test for this? I already tried compiling,
unraring 10+ GB of datas, running sys_basher, and memtest86+ v4.0
(passed a few weeks ago + this morning = five tests total). It doesn't
seem to stress/overload and temperatures related since most kernel
panics happened when mostly idled!


That entry in the manual means, there is a way to test that section
of the processor. But I'm not aware of any software that does things
like that. And because the 24593 document didn't say what triggered the
test, I can't comment on whether a motivated person could even write
some code to do it. Maybe there are one or more pins on the processor,
that have to be set up for that. I could see a server motherboard maker
perhaps, going the extra mile (doing a basic test on the processor,
before completing POST).

The pinout for AM2 socket isn't publicly available. This site says
the document needed is 31117.pdf, but you can't download that. So
there is no way to look for any pins with "interesting" names.

http://www.sandpile.org/docs/amd/k8.htm

My guess is, that a program like memtest86+, isn't going to specifically
target things like the TLB, while it tests main memory. It's possible
a small number of entries in the TLB were loaded by the BIOS, for perhaps
a linear mapping of some sort, and memtest86+ relies on that for what it
does. You'd have to look at the source for memtest86+, to see what it
does. I read a claim a couple days ago, that memtest86+ uses PAE, and
that should be a mapping trick as well. That is how a 32 bit executable
can be used to test system memory totals of greater than 4GB. It could
test 4GB at a time, and change mappings to access a different 4GB
block of
memory.


Ah, interesting. Thanks.


Last night, I ran memtest86+ v4.00's test #9.
http://www.memtest86.com/tech.html#descri says: "Test 9 [Bit fade test,
90 min, 2 patterns]

The bit fade test initializes all of memory with a pattern and then
sleeps for 90 minutes. Then memory is examined to see if any memory bits
have changed. All ones and all zero patterns are used. This test takes 3
hours to complete. The Bit Fade test is not included in the normal test
sequence and must be run manually via the runtime configuration menu."

I only ran it for over 3.25 hours and it passed (only one test).
Shouldn't this test that problem? Or is that TLB somewhere else? Maybe I
need to run it longer and more?

Also, I did a cat /var/log/messages |grep mcelog and posted the long log
at http://pastie.org/867602 ... Check out of those mcelog errors.

The author of cpuburn, told me to try seven and 37 "nice -19 ./burnMMX P
&" separately. I ran them for many hours, and no problems. I am starting
to notice that the errors and kernel panics seem to only occur when my
system is idled (again, not using cool'n'quiet).
--
"There's an ant crawling up your back in the nighttime." --TMBG
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
  #8  
Old March 13th 10, 10:21 AM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Paul
external usenet poster
 
Posts: 13,364
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote:
On 3/8/2010 8:21 AM PT, Ant typed:

The AMD processor apparently has BIST or built-in self test, for
memory
structures inside the processor. This document is not at all clear, on
whether you'd have that implemented on a typical desktop motherboard.
It is an optional operation, that would occur early after powerup.
It would allow bad internal memory inside the processor to be
detected,
before a computer boots. There is a bit in a special register, that
contains the test result, if the test was triggered.

(Section 14.1.1 PDF page 395 "Programmers Manual Vol.2")
http://support.amd.com/us/Processor_TechDocs/24593.pdf

Now, this is over my head. Is there a way to test this with softwares?
Does memtest86+ v4.00 test for this? I already tried compiling,
unraring 10+ GB of datas, running sys_basher, and memtest86+ v4.0
(passed a few weeks ago + this morning = five tests total). It doesn't
seem to stress/overload and temperatures related since most kernel
panics happened when mostly idled!

That entry in the manual means, there is a way to test that section
of the processor. But I'm not aware of any software that does things
like that. And because the 24593 document didn't say what triggered the
test, I can't comment on whether a motivated person could even write
some code to do it. Maybe there are one or more pins on the processor,
that have to be set up for that. I could see a server motherboard maker
perhaps, going the extra mile (doing a basic test on the processor,
before completing POST).

The pinout for AM2 socket isn't publicly available. This site says
the document needed is 31117.pdf, but you can't download that. So
there is no way to look for any pins with "interesting" names.

http://www.sandpile.org/docs/amd/k8.htm

My guess is, that a program like memtest86+, isn't going to specifically
target things like the TLB, while it tests main memory. It's possible
a small number of entries in the TLB were loaded by the BIOS, for
perhaps
a linear mapping of some sort, and memtest86+ relies on that for what it
does. You'd have to look at the source for memtest86+, to see what it
does. I read a claim a couple days ago, that memtest86+ uses PAE, and
that should be a mapping trick as well. That is how a 32 bit executable
can be used to test system memory totals of greater than 4GB. It could
test 4GB at a time, and change mappings to access a different 4GB
block of
memory.


Ah, interesting. Thanks.


Last night, I ran memtest86+ v4.00's test #9.
http://www.memtest86.com/tech.html#descri says: "Test 9 [Bit fade test,
90 min, 2 patterns]

The bit fade test initializes all of memory with a pattern and then
sleeps for 90 minutes. Then memory is examined to see if any memory bits
have changed. All ones and all zero patterns are used. This test takes 3
hours to complete. The Bit Fade test is not included in the normal test
sequence and must be run manually via the runtime configuration menu."

I only ran it for over 3.25 hours and it passed (only one test).
Shouldn't this test that problem? Or is that TLB somewhere else? Maybe I
need to run it longer and more?

Also, I did a cat /var/log/messages |grep mcelog and posted the long log
at http://pastie.org/867602 ... Check out of those mcelog errors.

The author of cpuburn, told me to try seven and 37 "nice -19 ./burnMMX P
&" separately. I ran them for many hours, and no problems. I am starting
to notice that the errors and kernel panics seem to only occur when my
system is idled (again, not using cool'n'quiet).


The TLB is part of your processor. It converts virtual addresses into
physical addresses. And it involves a small memory to store the
entries.

To test it, you'd need a test software specifically designed to verify
that it can hold entries, and the entries are pointed to the right
physical locations. I haven't read of any programs that do that
specifically.

The processor memory BIST function, is an example of a "structural" test,
which is used to verify that a chunk of hardware works. When we talk
about running test programs on the computer, those are "functional" tests.
It can be much harder, to get good test coverage, using nothing
but functional tests.

If you want a test case, that can make the situation worse, you'd need
a test program with a random access characteristic, something which causes
so many TLB entries to be used, that there are lots of page table walks,
swapping out of least recently used entries in the TLB and so on.

You can see someone characterizing a TLB here. I think the program
they're using, runs under Windows.

http://ixbtlabs.com/articles2/rmma/rmma-dothan.html

RMMA

http://cpu.rightmark.org/download.shtml

I have no idea under Windows, how a TLB ECC error would show up.
Machine check exception ? Or something in the Event Viewer ?

Paul
  #9  
Old March 13th 10, 05:32 PM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Ant[_3_]
external usenet poster
 
Posts: 756
Default "TLB parity error in virtual array; TLB error 'instruction"?

On 3/13/2010 1:21 AM PT, Paul typed:

The author of cpuburn, told me to try seven and 37 "nice -19 ./burnMMX
P &" separately. I ran them for many hours, and no problems. I am
starting to notice that the errors and kernel panics seem to only
occur when my system is idled (again, not using cool'n'quiet).


The TLB is part of your processor. It converts virtual addresses into
physical addresses. And it involves a small memory to store the
entries.

To test it, you'd need a test software specifically designed to verify
that it can hold entries, and the entries are pointed to the right
physical locations. I haven't read of any programs that do that
specifically.

The processor memory BIST function, is an example of a "structural" test,
which is used to verify that a chunk of hardware works. When we talk
about running test programs on the computer, those are "functional" tests.
It can be much harder, to get good test coverage, using nothing
but functional tests.

If you want a test case, that can make the situation worse, you'd need
a test program with a random access characteristic, something which causes
so many TLB entries to be used, that there are lots of page table walks,
swapping out of least recently used entries in the TLB and so on.

You can see someone characterizing a TLB here. I think the program
they're using, runs under Windows.

http://ixbtlabs.com/articles2/rmma/rmma-dothan.html

RMMA

http://cpu.rightmark.org/download.shtml

I have no idea under Windows, how a TLB ECC error would show up.
Machine check exception ? Or something in the Event Viewer ?


Thanks. I wonder if I can run it off WinPE CDs.

FYI, I still got errors overnight (no crashes) so I guess the
uninstallation of some the two packages and a reboot didn't help
according to dmesg:

[ 3299.988650] Machine check events logged
[ 8549.989026] Machine check events logged


HOWEVER, I did notice this:
[ 19.939430] powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core
Processor 4600+ processors (2 cpu cores) (version 2.20.00)
[ 19.939453] [Firmware Bug]: powernow-k8: No compatible ACPI _PSS
objects found.
[ 19.939454] [Firmware Bug]: powernow-k8: Try again with latest BIOS.

I think that error shows up only because I have it disabled in BIOS. I
wonder if enabling and using Cool'n'Quiet will fix the problems?

In the past I used these lines in my /etc/rc.local file (been using it
since my single core Athlon64 754 CPU system too):
modprobe powernow-k8
echo ondemand /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo 1 /sys/devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load
I added "#" to the beginning of each line to avoid running them with
disabled AMD's Cool'n'Quiet.
--
"Still we live meanly, like ants;... like pygmies we fight with
cranes;... Our life is frittered away by detail. Simplify...
simplify..." --Henry Thoreau
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: NT
( ) or

Ant is currently not listening to any songs on his home computer.
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"Parity Error Detected" message when running Intel Storage Console. Brcobrem Storage (alternative) 1 November 18th 09 09:49 PM
"paper is jammed" "at the transport" error message-Canon Mp830 (false error) markm75 Printers 2 August 19th 07 02:04 AM
chkdsk c: /f gives "An unspecified error occurred" error message John. Storage (alternative) 6 February 12th 07 04:59 PM
Samsung ML-2150 (2152W) (1) suddenly prints all pages "almost" blank and (2) error message "HSync Engine Error" , not in user manual Lady Margaret Thatcher Printers 5 May 4th 06 04:51 AM
ASUS A8V & ATI AIW 9600 "inf" "thunk.exe" error message? ByTor AMD x86-64 Processors 5 January 13th 06 07:50 PM


All times are GMT +1. The time now is 02:45 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.