If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
In comp.sys.ibm.pc.hardware.storage CBFalconer wrote:
Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. Note on ECC: If you get very little single bit-errors without ECC active, ECC will likely solve your problem. If you a lot of single-bit errors, or even only very fwe multiple-bit errors, then ECC wil not really help and will let errors through. For my scenario (single, random bit every 2GB), ECC would have done fine. Arno -- For email address: lastname AT tik DOT ee DOT ethz DOT ch GnuPG: ID:1E25338F FP:0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F "The more corrupt the state, the more numerous the laws" - Tacitus |
#12
|
|||
|
|||
Arno Wagner wrote:
In comp.sys.ibm.pc.hardware.storage CBFalconer wrote: Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. Note on ECC: If you get very little single bit-errors without ECC active, ECC will likely solve your problem. If you a lot of single-bit errors, or even only very fwe multiple-bit errors, then ECC wil not really help and will let errors through. For my scenario (single, random bit every 2GB), ECC would have done fine. The ECC implemented on PCs can typically correct 1-bit errors and detect 2-bit errors. One machine I worked with came up with a parity error one day. It was about a week old at the time so I sent it back to the distributer, who, being one of these little hole in the wall places and not Tech Data or the like, instead of swapping the machine or the board, instead had one of his high-school dropout techs "fix" it. The machine came back sans parity error. Ran fine for a while, then started getting complaints of data corruption. Tracked it down finally to a bad bit in the memory. Sure enough the guy had "fixed" it by disabling parity. Should have sued. This is one of the pernicious notions surrounding the testing of PCs--the notion that the only possible failure mode is a hang, totally ignoring the possibility that there will be data corruption that does not cause a hang, at least not of the machine, although it may cause the tech to be hung by the users. But if you're getting regular errors then regardless of the kind of memory you're using something is broken. Even with ECC if you're getting errors reported in the log you should find out why and fix the problem rather than just trusting the ECC--ECC is like RAID--it lets you run a busted machine without losing data--doesn't mean that the machine isn't busted and doesn't need fixing. Arno -- --John Reply to jclarke at ae tee tee global dot net (was jclarke at eye bee em dot net) |
#13
|
|||
|
|||
I've had an MB, which occasionally corrupted bit 0x80000000, but only during
disk I/O! And the corrupted bit position was unrelated to I/O buffers! Of course, standalone memory test didn't find anything. I've had to modify the test to make it run under Windows and also run parallel disk I/O threads. In that mode, the failure was detected in a minute. Had to dump the MB. Replacing memory and CPU didn't help. "Arno Wagner" wrote in message ... Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. |
#14
|
|||
|
|||
"J. Clarke" wrote:
Arno Wagner wrote: CBFalconer wrote: Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. Note on ECC: If you get very little single bit-errors without ECC active, ECC will likely solve your problem. If you a lot of single-bit errors, or even only very fwe multiple-bit errors, then ECC wil not really help and will let errors through. For my scenario (single, random bit every 2GB), ECC would have done fine. The ECC implemented on PCs can typically correct 1-bit errors and detect 2-bit errors. One machine I worked with came up with a parity error one day. It was about a week old at the time so I sent it back to the distributer, who, being one of these little hole in the wall places and not Tech Data or the like, instead of swapping the machine or the board, instead had one of his high-school dropout techs "fix" it. The machine came back sans parity error. Ran fine for a while, then started getting complaints of data corruption. Tracked it down finally to a bad bit in the memory. Sure enough the guy had "fixed" it by disabling parity. Should have sued. This is one of the pernicious notions surrounding the testing of PCs--the notion that the only possible failure mode is a hang, totally ignoring the possibility that there will be data corruption that does not cause a hang, at least not of the machine, although it may cause the tech to be hung by the users. But if you're getting regular errors then regardless of the kind of memory you're using something is broken. Even with ECC if you're getting errors reported in the log you should find out why and fix the problem rather than just trusting the ECC--ECC is like RAID--it lets you run a busted machine without losing data--doesn't mean that the machine isn't busted and doesn't need fixing. Well, this is somewhat refreshing. Usually when I get on my horse about having ECC memory I am greeted with a chorus of pooh-poohs, and denials about sneaky soft failures, cosmic rays, useless backups, etc. etc. In fact, walk into most computer stores and start talking about ECC and you will be greeted with blank stares. -- Chuck F ) ) Available for consulting/temporary embedded and systems. http://cbfalconer.home.att.net USE worldnet address! |
#15
|
|||
|
|||
"CBFalconer" wrote in message
Colin Painter wrote: If I can add a bit to JT's reply... If you are overclocking your memory you risk getting more errors than the guys who built the memory planned on. If the memory is not ECC memory then you may get more single bit errors which will cause your machine to stop when they occur. ECC memory can correct single bit errors but non-ECC memory can only detect them and when that happens windows will blue screen. Most home PCs have non-ECC memory because it's cheaper. Correction here - non ECC memory won't even detect any errors, it will just use the wrong value. Sometimes that MAY cause the OS to crash. Unfortunately the rest of the thread is lost due to top-posting. Bash topposters for topposting, not for your bad choice of News client or your failure to set it up properly. |
#16
|
|||
|
|||
In comp.sys.ibm.pc.hardware.storage Alexander Grigoriev wrote:
I've had an MB, which occasionally corrupted bit 0x80000000, but only during disk I/O! And the corrupted bit position was unrelated to I/O buffers! Of course, standalone memory test didn't find anything. I've had to modify the test to make it run under Windows and also run parallel disk I/O threads. In that mode, the failure was detected in a minute. Had to dump the MB. Replacing memory and CPU didn't help. Really nasty. Shows that these things have gotten far to complex... Arno -- For email address: lastname AT tik DOT ee DOT ethz DOT ch GnuPG: ID:1E25338F FP:0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F "The more corrupt the state, the more numerous the laws" - Tacitus |
#17
|
|||
|
|||
"Alexander Grigoriev" wrote in message hlink.net... I've had an MB, which occasionally corrupted bit 0x80000000, but only during disk I/O! And the corrupted bit position was unrelated to I/O buffers! Meaning? Of course, standalone memory test didn't find anything. I've had to modify the test to make it run under Windows and also run parallel disk I/O threads. What happened to that memory test. Last time I heard about it was when c't complained about you not supporting it anymore. In that mode, the failure was detected in a minute. Had to dump the MB. Replacing memory and CPU didn't help. "Arno Wagner" wrote in message ... Crashes are not your worst enemy. Undetected data corruption is. I once debugged a fileserver that did flip one bit on average per 2GB read or written. This thing had been used in this condition for several months by several people on a daily basis. Then one person noted that he got a corrupted archive sometimes (was a large file) when reading it, and sometimes not. There where likely quite a few changed files on disk at that time. If you have files that react badly to changed bits, that is a desaster. The solution was just to set the memory timing more conservatively. I made it two steps slower, without noticable impact on performance. |
#18
|
|||
|
|||
If you've overclocked the RAM, yes, there is a chance of getting timing
errors in the data which will lead to data corruption. -- DaveW "Mark M" wrote in message ... I use a partition copier which boots off a floppy disk before any other OS is launched. If I copy a partition from one hard drive to another, then is there any risk of data corruption if the BIOS has been changed to aggressively speed up the memory settings? For example the BIOS might set the memory to CAS=2 rather than CAS=3. Or other memory timing intervals might also be set to be shorter than is normal. I am thinking that maybe the IDE cable and drive controllers handle data fairly independently of the memory on the motherboard. So maybe data just flows up and down the IDE cable and maybe the motherboard is not involved except for sync pulses. There are three scenarios I am thinking about: (1) Copying a partition from one hard drive on one IDE cable to another hard drive on a different IDE cable. (2) Copying a partition from one hard drive to another which is on the same IDE cable. (3) Copying one partition to another on the same hard drive. How much effect would "over-set" memory have on these situations? Do the answers to any of the above three scenarios change if the copying of large amounts of data files is done from within WinXP? Personally, I would guess that it is more likely that motherboard memory comes into play if Windows is involved. |
#19
|
|||
|
|||
On Thu, 11 Mar 2004 00:40:47 GMT, Mark M
wrote: I use a partition copier which boots off a floppy disk before any other OS is launched. If I copy a partition from one hard drive to another, then is there any risk of data corruption if the BIOS has been changed to aggressively speed up the memory settings? yes if the machine was OC-ed to much to be really 100% rock solid % tested to be TL stable at that settings ... -- Regards, SPAJKY ® & visit my site @ http://www.spajky.vze.com "Tualatin OC-ed / BX-Slot1 / inaudible setup!" E-mail AntiSpam: remove ## |
#20
|
|||
|
|||
"Folkert Rienstra" wrote in message ... "Alexander Grigoriev" wrote in message hlink.net... I've had an MB, which occasionally corrupted bit 0x80000000, but only during disk I/O! And the corrupted bit position was unrelated to I/O buffers! Meaning? That it was not corrupt in transmit from/to disk. For example, memory allocated from kernel non-paged pool. One time I caught the crash with a debugger, and this was KEVENT structure corrupt. Of course, standalone memory test didn't find anything. I've had to modify the test to make it run under Windows and also run parallel disk I/O threads. What happened to that memory test. Last time I heard about it was when c't complained about you not supporting it anymore. Version 2 is available on http://home.earthlink.net/~alegr/download/memtest.htm Previous location at www.aha.ru is offline, AFAIK. |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Safe" memory testing | Timothy Lee | General | 1 | March 8th 04 08:04 PM |
CAS Timings De-Mystified, and other JEDEC Zins of DDR cRAMming...(Server Problems) | Aaron Dinkin | General | 0 | December 30th 03 02:29 AM |
CAS Timings De-Mystified, and other JEDEC Zins of DDR cRAMming... | Aaron Dinkin | General | 0 | December 30th 03 02:12 AM |
Buying Kingston RAM chips... | Wald | General | 7 | December 6th 03 04:56 AM |
Chaintech 7NIF2 motherboard - memory problems | Wuahn | General | 1 | July 26th 03 01:29 PM |