If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#41
|
|||
|
|||
In article ,
Brian Inglis wrote: And if you have big controller caches, better have chipkill memory to detect and correct multiple-bit memory errors, and allow the controller to keep running even if a memory chip fails. SECDED ECC was okay for low MBs of memory, but it's not good enough for current controller *cache* sizes of 100s MBs. I've seen SECDED ECC memory fail, but at least the system *knew* the memory failed, instead of us finding corrupt data later! I honestly don't see the point to protecting the data in the RAID controller's cache better than the data in the system's main memory; and even servers with multi-gigabyte memories often do only single-bit correct, double-bit detect these days. Specifying Chipkill for the RAID controller's cache has the effect of instantly eliminating a lot of otherwise reasonable hardware from the market, and given that any data that's going to be written has to pass through main memory anyway, the benefit seems somewhat dubious. -- Thor Lancelot Simon But as he knew no bad language, he had called him all the names of common objects that he could think of, and had screamed: "You lamp! You towel! You plate!" and so on. --Sigmund Freud |
#42
|
|||
|
|||
|
#43
|
|||
|
|||
In article 3,
Jesper Monsted wrote: (Thor Lancelot Simon) wrote in news:cj97nd$joj$1 : I honestly don't see the point to protecting the data in the RAID controller's cache better than the data in the system's main memory; and even servers with multi-gigabyte memories often do only single-bit correct, double-bit detect these days. Specifying Chipkill for the RAID controller's cache has the effect of instantly eliminating a lot of otherwise reasonable hardware from the market, and given that any data that's going to be written has to pass through main memory anyway, the benefit seems somewhat dubious. 1) Database software gets a commit command and starts writing changes to the main tablespace. 2) Database software finishes writing to write cache and marks database as "clean" 3) RAID controller starts destaging to disk, starting somewhere random in the cache and starts out by destaging the "clean" bit. 4) Power fails. Yes, yes, I understand _that_. What I don't undertand is why you seem to think that memory corruption in the controller's cache is somehow more likely than memory corruption in the host's main memory. *Either* will produce a corrupt on-disk database, and potentially one that looks "clean". So why does the controller's memory need more ECC protection than the database server's main memory? -- Thor Lancelot Simon But as he knew no bad language, he had called him all the names of common objects that he could think of, and had screamed: "You lamp! You towel! You plate!" and so on. --Sigmund Freud |
#44
|
|||
|
|||
"Malcolm Weir" wrote in message ... On Mon, 27 Sep 2004 00:38:59 +0100, Meurig Freeman wrote: I am beginning to spot a bit of a pattern with the replies :-) Thanks to everyone who has replied, a lot of useful information in there - powersupply failures, accidental switch pressing, leads coming lose and all the other suggestions are indeed all real possibilites. Ultimately it comes down to price/risk trade-off, but at least now I can make an informed decision. Bingo! Identify problems, identify probability of that problem occurring, identify cost and consequences of the problem occuring. It works *so* much better than "Don't worry about it"! After all these years you finally caught on. |
#45
|
|||
|
|||
"Malcolm Weir" wrote in message news On Mon, 27 Sep 2004 02:00:02 GMT, "Ron Reaugh" wrote: You mean a UPS failure that is coincident with a lost cache case where the cache had something important in it. That is well down the probability string. It may be (although in fact it isn't), but Ronnie, how do you know the susceptibility to loss any given application may have? Grown-ups know the proper way to offer advice is to illustrate the risks, and let the people with the most knowledge choose whether to accept a risk or not. Thankyou , anyone who reads this thread can see who has been doing that and who hasn't. |
#46
|
|||
|
|||
On Mon, 27 Sep 2004 21:54:59 GMT, "Ron Reaugh"
wrote: "Malcolm Weir" wrote in message news On Mon, 27 Sep 2004 02:00:02 GMT, "Ron Reaugh" wrote: You mean a UPS failure that is coincident with a lost cache case where the cache had something important in it. That is well down the probability string. It may be (although in fact it isn't), but Ronnie, how do you know the susceptibility to loss any given application may have? Grown-ups know the proper way to offer advice is to illustrate the risks, and let the people with the most knowledge choose whether to accept a risk or not. Thankyou , anyone who reads this thread can see who has been doing that and who hasn't. Yup. Those who refuse to talk about failure modes, because they don't understand them, for example. Do you understand "metadata" now, Ronnie? Malc. |
#47
|
|||
|
|||
"Brian Inglis" wrote in message ... fOn Sun, 26 Sep 2004 15:08:20 -0400 in comp.arch.storage, "Bill Todd" wrote: "Meurig Freeman" wrote in message .. . ... Without meaning to take sides, would someone mind explaining a situation whereby a general UPS would not be sufficient to protect data integrity in the event of a power failure? (and I don't mean things like the UPS catching fire, though I do understand this is a very real possibility). Ever seen your PC reboot on a power glitch that your more-than-adequately-sized home UPS should have ridden through without notice? I have. And this doesn't happen only with home units: it has been known to happen with server-room UPSs (though of course should be far less common there). The bottom line is that if you're going to use RAID-style redundancy, any non-disk elements of the system that are trusted to hold supposedly 'stable' data should be just as reliable as the redundant disks themselves. This means that you'd need to mirror cached data in double ECC caches, using separate power supplies and UPSs (or cache-level batteries), to obtain similar levels of availability. And if you have big controller caches, better have chipkill memory to detect and correct multiple-bit memory errors, and allow the controller to keep running even if a memory chip fails. SECDED ECC was okay for low MBs of memory, but it's not good enough for current controller *cache* sizes of 100s MBs. I've seen SECDED ECC memory fail, but at least the system *knew* the memory failed, instead of us finding corrupt data later! Large amounts of RAM on a RAID controller contribute little except in larger enterprise configurations. |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
IDE RAID | Ted Dawson | Asus Motherboards | 29 | September 21st 04 03:39 AM |
P4C800-E Delux: Setting up SATA Drives with RAID | Will | Asus Motherboards | 13 | July 12th 04 04:33 AM |
How to set up RAID 0+1 on P4C800E-DLX MB -using 4 SATA HDD's & 2 ATA133 HHD? | Data Wing | Asus Motherboards | 2 | June 5th 04 03:47 PM |
Gigabyte GA-8KNXP and Promise SX4000 RAID Controller | Old Dude | Gigabyte Motherboards | 4 | November 12th 03 07:26 PM |
P4C800 deluxe mirror raid as storage only OS on SATA drive | Tim | Asus Motherboards | 0 | July 18th 03 06:37 PM |