If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
MCE - Non fatal, correctible incident occurred on CPU 0
Hello,
My Gentoo box has recently started spewing out Machine Check Exception errors to my log files. They're correctable, and the machine appears to be running OK, but I'm just wondering if this is a foreshadowing of impending doom. I get four repeating MCE errors, from the moment the system starts up. I've run memtest86 for hours and it shows no error. I'm having a hard time figuring out what exactly the error is. Nothing is overclocked and the system is not overheating as far as I can tell. It's a 2.0 GHz Celeron in an Asus Pundit with latest BIOS. Here are the errors, followed by the parsemce output. Any ideas? ---- MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0. Bank 0: cc00003820040189 ../parsemce -e 1 -b 0 -s cc00003820040189 -a 0 Status: (1) Restart IP valid. parsebank(0): cc00003820040189 @ 0 External tag parity error Address in addr register valid MISC register information valid Error overflow Memory heirarchy error Request: Generic error Transaction type : Generic Memory/IO : Reserved MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0. Bank 1: c000000000000135 ../parsemce -e 1 -b 1 -s c000000000000135 -a 0 Status: (1) Restart IP valid. parsebank(1): c000000000000135 @ 0 External tag parity error Error overflow Memory heirarchy error Request: Generic error Transaction type : Data Memory/IO : Reserved MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0. Bank 2: 9000000000000153 ../parsemce -e 1 -b 2 -s 9000000000000153 -a 0 Status: (1) Restart IP valid. parsebank(2): 9000000000000153 @ 0 External tag parity error Error enabled in control register Memory heirarchy error Request: Generic error Transaction type : Instruction Memory/IO : Other MCE: The hardware reports a non fatal, correctable incident occurred on CPU 0. Bank 2: d000000000000153 ../parsemce -e 1 -b 2 -s d000000000000153 -a 0 Status: (1) Restart IP valid. parsebank(2): d000000000000153 @ 0 External tag parity error Error enabled in control register Error overflow Memory heirarchy error Request: Generic error Transaction type : Instruction Memory/IO : Other ---- Thanks! -WD |
#2
|
|||
|
|||
Alex Buell wrote:
Try replacing your processor. That could be it. I guess I'll have to see if this is something that's covered by the warranty, as it's within the 3-year period. Is it true that all MCE codes indicate an error that's internal to the CPU? Or could something external to the CPU trigger an MCE? Thanks -WD |
#3
|
|||
|
|||
On Sat, 08 Oct 2005 16:39:20 -0400, Will Dormann wrote:
Alex Buell wrote: Try replacing your processor. That could be it. I guess I'll have to see if this is something that's covered by the warranty, as it's within the 3-year period. Is it true that all MCE codes indicate an error that's internal to the CPU? Or could something external to the CPU trigger an MCE? Thanks -WD Normally I find them to be errors from the cache memory on the cpu. |
#4
|
|||
|
|||
Will Dormann wrote:
Alex Buell wrote: Try replacing your processor. That could be it. I guess I'll have to see if this is something that's covered by the warranty, as it's within the 3-year period. Is it true that all MCE codes indicate an error that's internal to the CPU? Or could something external to the CPU trigger an MCE? A badly seated CPU, or overheating motherboard components on the FSB, or a few other things, but these errors would *usually* mean a CPU problem. In your case, it *may* be the memory controller, since you see: External tag parity error ... Memory heirarchy error The tag cache is part of the internal cache that's set aside to index external banks of memory. If the memory controller has a problem, you could presumably get this error. Of course, you might also see this if the cache on the CPU is bad (or overheated). I'd first try reseating the CPU and RAM and blow away any dust that might have accumulated on the motherboard or in the CPU HSF assembly. It might not help, but it's free and worth a try. Oh, and send an email to the maintainer of parsemce and tell him he means hierarchy and not heirarchy. The latter would be a society ruled by the eldest child of still living parents... :-) Regards, -- *Art |
#5
|
|||
|
|||
Arthur Hagen wrote:
I'd first try reseating the CPU and RAM and blow away any dust that might have accumulated on the motherboard or in the CPU HSF assembly. It might not help, but it's free and worth a try. Thanks for the follow-up. Earlier today I did exactly the above, but it didn't have any effect on the MCE errors. I tried running Prime95 for a few hours, and it ran without error. Although I feel like I'm doing the equivalent of ignoring the "check engine" light on my car, I might just live with it until I actually see symptoms other than the MCE. -WD |
#6
|
|||
|
|||
In comp.sys.ibm.pc.hardware.chips Will Dormann wrote:
I tried running Prime95 for a few hours, and it ran without error. Although I feel like I'm doing the equivalent of ignoring the "check engine" light on my car, I might just live with it until I actually see symptoms other than the MCE. You can try running my `burnMMX` with a fairly low memory parameter like `E` or `H` to exercise your cache ECC -- Robert author `cpuburn` http://pages.sbcglobal.net/redelm |
#7
|
|||
|
|||
MCE - Non fatal, correctible incident occurred on CPU 0
.... or "Intel warranty fun"
I tried running Prime95 for a few hours, and it ran without error. Although I feel like I'm doing the equivalent of ignoring the "check engine" light on my car, I might just live with it until I actually see symptoms other than the MCE. Well, I'm finally seeing symptoms of instability now. The MCE errors have been continuing, but with increased frequency now. But now I can't compile MythTV anymore. The compilation itself crashes at various stages. (Never at the same spot) Prime95 fails within a few minutes with a math error. Now I get to deal with the Intel warranty process... I call the number, and am transferred to an offshore call center with a bad connection. I explain the above and why I would like a replacement processor. Then I get disconnected. I call again, go through the same steps explaining the problem to a different person. I explain the Machine Check Exception errors, the failed compilation, the Prime95 failure. The processor temp is under 50C and Memtest86 passes without error. His answer: I must take the CPU to a "local computer store" and have them test the processor before I can get a replacement. (( ASIDE: What's so special about a "local computer store" that allows them to determine if I can get an RMA or not? Do they possess some magical trait that lets them see if a processor is bad or not, which a mere mortal such as myself couldn't dream of having? Would a tech at a "local computer store" hook up the CPU to a system that can verify processor MCE codes? Or would they plug in the chip, turn it on, and say "it's OK" when they see it POST? )) Then I get disconnected again. I call back for the third time, and I get a recording saying that customer service is closed. It's great that this chip has a 3-year warranty and all, but who knows if I'll actually be able to take advantage of it! I guess by Monday I might now, assuming I don't have an aneurysm by then. -WD |
#8
|
|||
|
|||
MCE - Non fatal, correctible incident occurred on CPU 0
On Saturday 29 October 2005 02:35, Will Dormann stood up and spoke the
following words to the masses in /alt.os.linux.gentoo...:/ ... or "Intel warranty fun" I tried running Prime95 for a few hours, and it ran without error. Although I feel like I'm doing the equivalent of ignoring the "check engine" light on my car, I might just live with it until I actually see symptoms other than the MCE. Well, I'm finally seeing symptoms of instability now. The MCE errors have been continuing, but with increased frequency now. But now I can't compile MythTV anymore. The compilation itself crashes at various stages. (Never at the same spot) Prime95 fails within a few minutes with a math error. Now I get to deal with the Intel warranty process... I call the number, and am transferred to an offshore call center with a bad connection. I explain the above and why I would like a replacement processor. Then I get disconnected. I call again, go through the same steps explaining the problem to a different person. I explain the Machine Check Exception errors, the failed compilation, the Prime95 failure. The processor temp is under 50C and Memtest86 passes without error. His answer: I must take the CPU to a "local computer store" and have them test the processor before I can get a replacement. (( ASIDE: What's so special about a "local computer store" that allows them to determine if I can get an RMA or not? Do they possess some magical trait that lets them see if a processor is bad or not, which a mere mortal such as myself couldn't dream of having? Would a tech at a "local computer store" hook up the CPU to a system that can verify processor MCE codes? Or would they plug in the chip, turn it on, and say "it's OK" when they see it POST? )) My guess is that they need to work through an authorized reseller for the RMA procedure. This is more of an administrative matter, as an authorized reseller is supposed to be qualified to unmount a CPU from the motherboard and package it in such a way that the CPU arrives back at the tech department without any additional damage, which would void your warranty. A second possibility is that some - but not all - resellers have specialized hardware test cards that analyze every component in your system. Then I get disconnected again. I call back for the third time, and I get a recording saying that customer service is closed. It's great that this chip has a 3-year warranty and all, but who knows if I'll actually be able to take advantage of it! I guess by Monday I might now, assuming I don't have an aneurysm by then. I'd be surprised actually... Intel is a reputed company. I myself have however had to deal with Chaintech - and this was _through_ an authorized reseller - and they stonewalled the whole procedure for so long that the warranty had eventually expired. I never got that new motherboard they promised, nor did I get a refund... :-/ -- With kind regards, *Aragorn* (Registered GNU/Linux user # 223157) |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Critical errors? ATA Error Count | Al Bogner | Storage (alternative) | 0 | June 13th 04 12:14 PM |
Windows Explorer Fatal Exception with Second Hard Drive on G6-266 w/ Ultra66 | Flippy | Gateway Computers | 4 | February 18th 04 10:45 PM |
Cannot boot P4C800 Deluxe, cannot get into BIOS, Fatal Error.... | thegroover | Asus Motherboards | 6 | December 23rd 03 02:20 AM |
Fatal exception error when Canon S520 turned on | ECLiPSE 2002 | Printers | 2 | December 7th 03 02:47 PM |
A7V8X-X fatal error prob | venom | Asus Motherboards | 14 | December 6th 03 08:55 AM |