A computer components & hardware forum. HardwareBanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » HardwareBanter forum » Processors » General
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

MCE - Non fatal, correctible incident occurred on CPU 0



 
 
Thread Tools Display Modes
  #1  
Old October 8th 05, 07:05 PM
Will Dormann
external usenet poster
 
Posts: n/a
Default MCE - Non fatal, correctible incident occurred on CPU 0

Hello,

My Gentoo box has recently started spewing out Machine Check Exception
errors to my log files. They're correctable, and the machine appears to
be running OK, but I'm just wondering if this is a foreshadowing of
impending doom.

I get four repeating MCE errors, from the moment the system starts up.
I've run memtest86 for hours and it shows no error. I'm having a hard
time figuring out what exactly the error is. Nothing is overclocked
and the system is not overheating as far as I can tell. It's a 2.0 GHz
Celeron in an Asus Pundit with latest BIOS.

Here are the errors, followed by the parsemce output. Any ideas?

----

MCE: The hardware reports a non fatal, correctable incident occurred on
CPU 0.
Bank 0: cc00003820040189

../parsemce -e 1 -b 0 -s cc00003820040189 -a 0
Status: (1) Restart IP valid.
parsebank(0): cc00003820040189 @ 0
External tag parity error
Address in addr register valid
MISC register information valid
Error overflow
Memory heirarchy error
Request: Generic error
Transaction type : Generic
Memory/IO : Reserved


MCE: The hardware reports a non fatal, correctable incident occurred on
CPU 0.
Bank 1: c000000000000135

../parsemce -e 1 -b 1 -s c000000000000135 -a 0
Status: (1) Restart IP valid.
parsebank(1): c000000000000135 @ 0
External tag parity error
Error overflow
Memory heirarchy error
Request: Generic error
Transaction type : Data
Memory/IO : Reserved


MCE: The hardware reports a non fatal, correctable incident occurred on
CPU 0.
Bank 2: 9000000000000153

../parsemce -e 1 -b 2 -s 9000000000000153 -a 0
Status: (1) Restart IP valid.
parsebank(2): 9000000000000153 @ 0
External tag parity error
Error enabled in control register
Memory heirarchy error
Request: Generic error
Transaction type : Instruction
Memory/IO : Other


MCE: The hardware reports a non fatal, correctable incident occurred on
CPU 0.
Bank 2: d000000000000153

../parsemce -e 1 -b 2 -s d000000000000153 -a 0
Status: (1) Restart IP valid.
parsebank(2): d000000000000153 @ 0
External tag parity error
Error enabled in control register
Error overflow
Memory heirarchy error
Request: Generic error
Transaction type : Instruction
Memory/IO : Other




----


Thanks!
-WD
  #2  
Old October 8th 05, 09:39 PM
Will Dormann
external usenet poster
 
Posts: n/a
Default

Alex Buell wrote:

Try replacing your processor.



That could be it. I guess I'll have to see if this is something that's
covered by the warranty, as it's within the 3-year period.

Is it true that all MCE codes indicate an error that's internal to the
CPU? Or could something external to the CPU trigger an MCE?


Thanks
-WD
  #3  
Old October 9th 05, 01:21 AM
Mr Toad
external usenet poster
 
Posts: n/a
Default

On Sat, 08 Oct 2005 16:39:20 -0400, Will Dormann wrote:

Alex Buell wrote:

Try replacing your processor.



That could be it. I guess I'll have to see if this is something that's
covered by the warranty, as it's within the 3-year period.

Is it true that all MCE codes indicate an error that's internal to the
CPU? Or could something external to the CPU trigger an MCE?


Thanks
-WD


Normally I find them to be errors from the cache memory on the cpu.



  #4  
Old October 9th 05, 04:38 AM
Arthur Hagen
external usenet poster
 
Posts: n/a
Default

Will Dormann wrote:
Alex Buell wrote:

Try replacing your processor.



That could be it. I guess I'll have to see if this is something
that's covered by the warranty, as it's within the 3-year period.

Is it true that all MCE codes indicate an error that's internal to the
CPU? Or could something external to the CPU trigger an MCE?


A badly seated CPU, or overheating motherboard components on the FSB, or
a few other things, but these errors would *usually* mean a CPU problem.
In your case, it *may* be the memory controller, since you see:

External tag parity error
...
Memory heirarchy error


The tag cache is part of the internal cache that's set aside to index
external banks of memory. If the memory controller has a problem, you
could presumably get this error. Of course, you might also see this if
the cache on the CPU is bad (or overheated).

I'd first try reseating the CPU and RAM and blow away any dust that
might have accumulated on the motherboard or in the CPU HSF assembly.
It might not help, but it's free and worth a try.

Oh, and send an email to the maintainer of parsemce and tell him he
means hierarchy and not heirarchy. The latter would be a society ruled
by the eldest child of still living parents... :-)

Regards,
--
*Art

  #5  
Old October 9th 05, 06:22 AM
Will Dormann
external usenet poster
 
Posts: n/a
Default

Arthur Hagen wrote:
I'd first try reseating the CPU and RAM and blow away any dust that
might have accumulated on the motherboard or in the CPU HSF assembly.
It might not help, but it's free and worth a try.



Thanks for the follow-up. Earlier today I did exactly the above, but
it didn't have any effect on the MCE errors.

I tried running Prime95 for a few hours, and it ran without error.
Although I feel like I'm doing the equivalent of ignoring the "check
engine" light on my car, I might just live with it until I actually see
symptoms other than the MCE.



-WD
  #6  
Old October 9th 05, 08:25 PM
Robert Redelmeier
external usenet poster
 
Posts: n/a
Default

In comp.sys.ibm.pc.hardware.chips Will Dormann wrote:
I tried running Prime95 for a few hours, and it ran without
error. Although I feel like I'm doing the equivalent of ignoring
the "check engine" light on my car, I might just live with it
until I actually see symptoms other than the MCE.


You can try running my `burnMMX` with a fairly low memory
parameter like `E` or `H` to exercise your cache ECC

-- Robert author `cpuburn` http://pages.sbcglobal.net/redelm

  #7  
Old October 29th 05, 01:35 AM
Will Dormann
external usenet poster
 
Posts: n/a
Default MCE - Non fatal, correctible incident occurred on CPU 0

.... or "Intel warranty fun"

I tried running Prime95 for a few hours, and it ran without error.
Although I feel like I'm doing the equivalent of ignoring the "check
engine" light on my car, I might just live with it until I actually see
symptoms other than the MCE.



Well, I'm finally seeing symptoms of instability now. The MCE errors
have been continuing, but with increased frequency now. But now I can't
compile MythTV anymore. The compilation itself crashes at various
stages. (Never at the same spot)

Prime95 fails within a few minutes with a math error.

Now I get to deal with the Intel warranty process...

I call the number, and am transferred to an offshore call center with a
bad connection. I explain the above and why I would like a replacement
processor. Then I get disconnected.

I call again, go through the same steps explaining the problem to a
different person. I explain the Machine Check Exception errors, the
failed compilation, the Prime95 failure. The processor temp is under
50C and Memtest86 passes without error.

His answer: I must take the CPU to a "local computer store" and have
them test the processor before I can get a replacement.

(( ASIDE: What's so special about a "local computer store" that allows
them to determine if I can get an RMA or not? Do they possess some
magical trait that lets them see if a processor is bad or not, which a
mere mortal such as myself couldn't dream of having? Would a tech at a
"local computer store" hook up the CPU to a system that can verify
processor MCE codes? Or would they plug in the chip, turn it on, and
say "it's OK" when they see it POST? ))

Then I get disconnected again.

I call back for the third time, and I get a recording saying that
customer service is closed.

It's great that this chip has a 3-year warranty and all, but who knows
if I'll actually be able to take advantage of it! I guess by Monday I
might now, assuming I don't have an aneurysm by then.


-WD
  #8  
Old October 29th 05, 03:25 AM
Aragorn
external usenet poster
 
Posts: n/a
Default MCE - Non fatal, correctible incident occurred on CPU 0

On Saturday 29 October 2005 02:35, Will Dormann stood up and spoke the
following words to the masses in /alt.os.linux.gentoo...:/

... or "Intel warranty fun"

I tried running Prime95 for a few hours, and it ran without error.
Although I feel like I'm doing the equivalent of ignoring the "check
engine" light on my car, I might just live with it until I actually
see symptoms other than the MCE.



Well, I'm finally seeing symptoms of instability now. The MCE errors
have been continuing, but with increased frequency now. But now I
can't compile MythTV anymore. The compilation itself crashes at
various stages. (Never at the same spot)

Prime95 fails within a few minutes with a math error.

Now I get to deal with the Intel warranty process...

I call the number, and am transferred to an offshore call center with
a bad connection. I explain the above and why I would like a
replacement processor. Then I get disconnected.

I call again, go through the same steps explaining the problem to a
different person. I explain the Machine Check Exception errors, the
failed compilation, the Prime95 failure. The processor temp is under
50C and Memtest86 passes without error.

His answer: I must take the CPU to a "local computer store" and have
them test the processor before I can get a replacement.

(( ASIDE: What's so special about a "local computer store" that allows
them to determine if I can get an RMA or not? Do they possess some
magical trait that lets them see if a processor is bad or not, which a
mere mortal such as myself couldn't dream of having? Would a tech at
a "local computer store" hook up the CPU to a system that can verify
processor MCE codes? Or would they plug in the chip, turn it on, and
say "it's OK" when they see it POST? ))


My guess is that they need to work through an authorized reseller for
the RMA procedure. This is more of an administrative matter, as an
authorized reseller is supposed to be qualified to unmount a CPU from
the motherboard and package it in such a way that the CPU arrives back
at the tech department without any additional damage, which would void
your warranty.

A second possibility is that some - but not all - resellers have
specialized hardware test cards that analyze every component in your
system.

Then I get disconnected again.

I call back for the third time, and I get a recording saying that
customer service is closed.

It's great that this chip has a 3-year warranty and all, but who knows
if I'll actually be able to take advantage of it! I guess by Monday I
might now, assuming I don't have an aneurysm by then.


I'd be surprised actually... Intel is a reputed company. I myself have
however had to deal with Chaintech - and this was _through_ an
authorized reseller - and they stonewalled the whole procedure for so
long that the warranty had eventually expired.

I never got that new motherboard they promised, nor did I get a
refund... :-/

--
With kind regards,

*Aragorn*
(Registered GNU/Linux user # 223157)
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Critical errors? ATA Error Count Al Bogner Storage (alternative) 0 June 13th 04 12:14 PM
Windows Explorer Fatal Exception with Second Hard Drive on G6-266 w/ Ultra66 Flippy Gateway Computers 4 February 18th 04 10:45 PM
Cannot boot P4C800 Deluxe, cannot get into BIOS, Fatal Error.... thegroover Asus Motherboards 6 December 23rd 03 02:20 AM
Fatal exception error when Canon S520 turned on ECLiPSE 2002 Printers 2 December 7th 03 02:47 PM
A7V8X-X fatal error prob venom Asus Motherboards 14 December 6th 03 08:55 AM


All times are GMT +1. The time now is 04:14 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 HardwareBanter.
The comments are property of their posters.