View Single Post
  #8  
Old March 13th 10, 09:21 AM posted to alt.comp.hardware.amd.x86-64,alt.comp.hardwre.amd.x86-64,alt.comp.periphs.mainboard.msi-microstar
Paul
external usenet poster
 
Posts: 13,364
Default "TLB parity error in virtual array; TLB error 'instruction"?

Ant wrote:
On 3/8/2010 8:21 AM PT, Ant typed:

The AMD processor apparently has BIST or built-in self test, for
memory
structures inside the processor. This document is not at all clear, on
whether you'd have that implemented on a typical desktop motherboard.
It is an optional operation, that would occur early after powerup.
It would allow bad internal memory inside the processor to be
detected,
before a computer boots. There is a bit in a special register, that
contains the test result, if the test was triggered.

(Section 14.1.1 PDF page 395 "Programmers Manual Vol.2")
http://support.amd.com/us/Processor_TechDocs/24593.pdf

Now, this is over my head. Is there a way to test this with softwares?
Does memtest86+ v4.00 test for this? I already tried compiling,
unraring 10+ GB of datas, running sys_basher, and memtest86+ v4.0
(passed a few weeks ago + this morning = five tests total). It doesn't
seem to stress/overload and temperatures related since most kernel
panics happened when mostly idled!

That entry in the manual means, there is a way to test that section
of the processor. But I'm not aware of any software that does things
like that. And because the 24593 document didn't say what triggered the
test, I can't comment on whether a motivated person could even write
some code to do it. Maybe there are one or more pins on the processor,
that have to be set up for that. I could see a server motherboard maker
perhaps, going the extra mile (doing a basic test on the processor,
before completing POST).

The pinout for AM2 socket isn't publicly available. This site says
the document needed is 31117.pdf, but you can't download that. So
there is no way to look for any pins with "interesting" names.

http://www.sandpile.org/docs/amd/k8.htm

My guess is, that a program like memtest86+, isn't going to specifically
target things like the TLB, while it tests main memory. It's possible
a small number of entries in the TLB were loaded by the BIOS, for
perhaps
a linear mapping of some sort, and memtest86+ relies on that for what it
does. You'd have to look at the source for memtest86+, to see what it
does. I read a claim a couple days ago, that memtest86+ uses PAE, and
that should be a mapping trick as well. That is how a 32 bit executable
can be used to test system memory totals of greater than 4GB. It could
test 4GB at a time, and change mappings to access a different 4GB
block of
memory.


Ah, interesting. Thanks.


Last night, I ran memtest86+ v4.00's test #9.
http://www.memtest86.com/tech.html#descri says: "Test 9 [Bit fade test,
90 min, 2 patterns]

The bit fade test initializes all of memory with a pattern and then
sleeps for 90 minutes. Then memory is examined to see if any memory bits
have changed. All ones and all zero patterns are used. This test takes 3
hours to complete. The Bit Fade test is not included in the normal test
sequence and must be run manually via the runtime configuration menu."

I only ran it for over 3.25 hours and it passed (only one test).
Shouldn't this test that problem? Or is that TLB somewhere else? Maybe I
need to run it longer and more?

Also, I did a cat /var/log/messages |grep mcelog and posted the long log
at http://pastie.org/867602 ... Check out of those mcelog errors.

The author of cpuburn, told me to try seven and 37 "nice -19 ./burnMMX P
&" separately. I ran them for many hours, and no problems. I am starting
to notice that the errors and kernel panics seem to only occur when my
system is idled (again, not using cool'n'quiet).


The TLB is part of your processor. It converts virtual addresses into
physical addresses. And it involves a small memory to store the
entries.

To test it, you'd need a test software specifically designed to verify
that it can hold entries, and the entries are pointed to the right
physical locations. I haven't read of any programs that do that
specifically.

The processor memory BIST function, is an example of a "structural" test,
which is used to verify that a chunk of hardware works. When we talk
about running test programs on the computer, those are "functional" tests.
It can be much harder, to get good test coverage, using nothing
but functional tests.

If you want a test case, that can make the situation worse, you'd need
a test program with a random access characteristic, something which causes
so many TLB entries to be used, that there are lots of page table walks,
swapping out of least recently used entries in the TLB and so on.

You can see someone characterizing a TLB here. I think the program
they're using, runs under Windows.

http://ixbtlabs.com/articles2/rmma/rmma-dothan.html

RMMA

http://cpu.rightmark.org/download.shtml

I have no idea under Windows, how a TLB ECC error would show up.
Machine check exception ? Or something in the Event Viewer ?

Paul