View Single Post
  #3  
Old March 8th 10, 06:22 AM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Paul
external usenet poster
 
Posts: 13,364
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote:
I also ran sys_basher (http://www.polybus.com/sys_basher_web/) in my
Debian a few times in the past and just now. No errors or crashes.


On 3/7/2010 8:59 AM PT, Ant typed:

Hello.

Lately, I have been random and rare kernel panics on my old Debian/Linux
box (tried both Kernel versions 2.6.30 and 2.6.32). I couldn't figure
out what it was until I discovered mcelog a couple days ago, and it
revealed interesting scary datas in my dmesg/messages and syslog:

# cat /var/log/messages
...
Mar 7 08:25:24 MyLinuxBox kernel: [ 3299.988026] Machine check events
logged
Mar 7 08:25:24 MyLinuxBox mcelog: HARDWARE ERROR. This is *NOT* a
software problem!
Mar 7 08:25:24 MyLinuxBox mcelog: Please contact your hardware vendor
Mar 7 08:25:24 MyLinuxBox mcelog: MCE 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPU 1 1 instruction cache
Mar 7 08:25:24 MyLinuxBox mcelog: ADDR c11b6ff0
Mar 7 08:25:24 MyLinuxBox mcelog: TIME 1267979124 Sun Mar 7 08:25:24 2010
Mar 7 08:25:24 MyLinuxBox mcelog: TLB parity error in virtual array
Mar 7 08:25:24 MyLinuxBox mcelog: TLB error 'instruction transaction,
level 1'
Mar 7 08:25:24 MyLinuxBox mcelog: STATUS 9400000000010011 MCGSTATUS 0
Mar 7 08:25:24 MyLinuxBox mcelog: MCGCAP 105 APICID 1 SOCKETID 0
Mar 7 08:25:24 MyLinuxBox mcelog: CPUID Vendor AMD Family 15 Model 43

I am not familiar with hardwares, so I assume this is very bad, but what
part(s) is/are bad? Is my old Athlon 64 X2 CPU dying/damaged? I have had
it and its motherboard since 12/24/2006, so it is not that old yet. I
have the full details on my secondary machine at
http://alpha.zimage.com/~ant/antfarm.../computers.txt ...

Although, this might be related to the PSU's death back in early
December 2009. My friend and I believe it also took out my EVGA GeForce
8800 GT video card and damage a 512 MB of RAM (tested 3 GB with and each
piece with memtest86+ v4.00 to narrow it down).
http://alpha.zimage.com/~ant/antfarm/about/toys.html has a log of the
details of my systems. I did run memtest86+ again a couple weeks ago and
this morning for 5-6 hours, but not got no errors after five full tests
(passed). I also do not overclock/OC.

Thank you in advance.


http://en.wikipedia.org/wiki/Transla...okaside_buffer

TLB stands for Translation Lookaside Buffer.
It translates from virtual addresses to physical addresses.
And apparently, according to the AMD documentation, it is protected by parity.
It is part of the processor.

A question would be, if it was a real error, why weren't there crash
symptoms or side effects ? If an incorrect mapping from virtual space
to physical occurred, you'd think there would be consequences. (Maybe
the entry is automatically invalidated and reloaded via page table walk ?)

The AMD processor apparently has BIST or built-in self test, for memory
structures inside the processor. This document is not at all clear, on
whether you'd have that implemented on a typical desktop motherboard.
It is an optional operation, that would occur early after powerup.
It would allow bad internal memory inside the processor to be detected,
before a computer boots. There is a bit in a special register, that
contains the test result, if the test was triggered.

(Section 14.1.1 PDF page 395 "Programmers Manual Vol.2")
http://support.amd.com/us/Processor_TechDocs/24593.pdf

Paul