Adaptec 2410SA SATA Raid 5 : is it a safe choice

#41 September 27th 04, 03:22 PM

In article ,
Brian Inglis wrote:

And if you have big controller caches, better have chipkill memory to
detect and correct multiple-bit memory errors, and allow the
controller to keep running even if a memory chip fails.
SECDED ECC was okay for low MBs of memory, but it's not good enough
for current controller *cache* sizes of 100s MBs.
I've seen SECDED ECC memory fail, but at least the system *knew* the
memory failed, instead of us finding corrupt data later!

I honestly don't see the point to protecting the data in the RAID
controller's cache better than the data in the system's main memory;
and even servers with multi-gigabyte memories often do only single-bit
correct, double-bit detect these days. Specifying Chipkill for the
RAID controller's cache has the effect of instantly eliminating a lot
of otherwise reasonable hardware from the market, and given that any
data that's going to be written has to pass through main memory anyway,
the benefit seems somewhat dubious.

--
Thor Lancelot Simon
But as he knew no bad language, he had called him all the names of common
objects that he could think of, and had screamed: "You lamp! You towel! You
plate!" and so on. --Sigmund Freud

#42 September 27th 04, 05:00 PM

(Thor Lancelot Simon) wrote in news:cj97nd$joj$1
@panix5.panix.com:

I honestly don't see the point to protecting the data in the RAID
controller's cache better than the data in the system's main memory;
and even servers with multi-gigabyte memories often do only single-bit
correct, double-bit detect these days. Specifying Chipkill for the
RAID controller's cache has the effect of instantly eliminating a lot
of otherwise reasonable hardware from the market, and given that any
data that's going to be written has to pass through main memory anyway,
the benefit seems somewhat dubious.

1) Database software gets a commit command and starts writing changes to
the main tablespace.
2) Database software finishes writing to write cache and marks database as
"clean"
3) RAID controller starts destaging to disk, starting somewhere random in
the cache and starts out by destaging the "clean" bit.
4) Power fails.

You've now got a database that looks clean but you have no idea what
contains. You will be able to recover all but that last transaction and
that would've been good enough for most people. Unfortunately, the
application has already told the customer that his 10000 copies of
FurryPr0n amateur orgy 13 has been dispatched (it did get a successful
return from the commit in the database) and in a few weeks you're being
blamed for them not arriving.

Of course, some RDBS's are intelligent enough to not allow write cache, but
that's a different story

--
/Jesper Monsted

#43 September 27th 04, 05:51 PM

In article 3,
Jesper Monsted wrote:
(Thor Lancelot Simon) wrote in news:cj97nd$joj$1
:

I honestly don't see the point to protecting the data in the RAID
controller's cache better than the data in the system's main memory;
and even servers with multi-gigabyte memories often do only single-bit
correct, double-bit detect these days. Specifying Chipkill for the
RAID controller's cache has the effect of instantly eliminating a lot
of otherwise reasonable hardware from the market, and given that any
data that's going to be written has to pass through main memory anyway,
the benefit seems somewhat dubious.

1) Database software gets a commit command and starts writing changes to
the main tablespace.
2) Database software finishes writing to write cache and marks database as
"clean"
3) RAID controller starts destaging to disk, starting somewhere random in
the cache and starts out by destaging the "clean" bit.
4) Power fails.

Yes, yes, I understand _that_. What I don't undertand is why you seem
to think that memory corruption in the controller's cache is somehow
more likely than memory corruption in the host's main memory.

*Either* will produce a corrupt on-disk database, and potentially one
that looks "clean". So why does the controller's memory need more ECC
protection than the database server's main memory?

--
Thor Lancelot Simon
But as he knew no bad language, he had called him all the names of common
objects that he could think of, and had screamed: "You lamp! You towel! You
plate!" and so on. --Sigmund Freud

#44 September 27th 04, 10:52 PM

"Malcolm Weir" wrote in message
...
On Mon, 27 Sep 2004 00:38:59 +0100, Meurig Freeman
wrote:

I am beginning to spot a bit of a pattern with the replies :-)

Thanks to everyone who has replied, a lot of useful information in there
- powersupply failures, accidental switch pressing, leads coming lose
and all the other suggestions are indeed all real possibilites.

Ultimately it comes down to price/risk trade-off, but at least now I can
make an informed decision.

Bingo!

Identify problems, identify probability of that problem occurring,
identify cost and consequences of the problem occuring.

It works *so* much better than "Don't worry about it"!

After all these years you finally caught on.

#45 September 27th 04, 10:54 PM

"Malcolm Weir" wrote in message
news

On Mon, 27 Sep 2004 02:00:02 GMT, "Ron Reaugh"
wrote:

You mean a UPS failure that is coincident with a lost cache case where
the
cache had something important in it. That is well down the probability
string.

It may be (although in fact it isn't), but Ronnie, how do you know the
susceptibility to loss any given application may have?

Grown-ups know the proper way to offer advice is to illustrate the
risks, and let the people with the most knowledge choose whether to
accept a risk or not.

Thankyou , anyone who reads this thread can see who has been doing that and
who hasn't.

#46 September 27th 04, 11:17 PM

On Mon, 27 Sep 2004 21:54:59 GMT, "Ron Reaugh"
wrote:

"Malcolm Weir" wrote in message
news
On Mon, 27 Sep 2004 02:00:02 GMT, "Ron Reaugh"
wrote:

You mean a UPS failure that is coincident with a lost cache case where
the
cache had something important in it. That is well down the probability
string.

It may be (although in fact it isn't), but Ronnie, how do you know the
susceptibility to loss any given application may have?

Grown-ups know the proper way to offer advice is to illustrate the
risks, and let the people with the most knowledge choose whether to
accept a risk or not.

Thankyou , anyone who reads this thread can see who has been doing that and
who hasn't.

Yup. Those who refuse to talk about failure modes, because they don't
understand them, for example.

Do you understand "metadata" now, Ronnie?

Malc.

#47 September 27th 04, 11:19 PM

"Brian Inglis" wrote in message
...
fOn Sun, 26 Sep 2004 15:08:20 -0400 in comp.arch.storage, "Bill Todd"
wrote:

"Meurig Freeman" wrote in message
.. .

...

Without meaning to take sides, would someone mind explaining a
situation
whereby a general UPS would not be sufficient to protect data integrity
in the event of a power failure? (and I don't mean things like the UPS
catching fire, though I do understand this is a very real possibility).

Ever seen your PC reboot on a power glitch that your
more-than-adequately-sized home UPS should have ridden through without
notice? I have. And this doesn't happen only with home units: it has
been
known to happen with server-room UPSs (though of course should be far
less
common there).

The bottom line is that if you're going to use RAID-style redundancy, any
non-disk elements of the system that are trusted to hold supposedly
'stable'
data should be just as reliable as the redundant disks themselves. This
means that you'd need to mirror cached data in double ECC caches, using
separate power supplies and UPSs (or cache-level batteries), to obtain
similar levels of availability.

And if you have big controller caches, better have chipkill memory to
detect and correct multiple-bit memory errors, and allow the
controller to keep running even if a memory chip fails.
SECDED ECC was okay for low MBs of memory, but it's not good enough
for current controller *cache* sizes of 100s MBs.
I've seen SECDED ECC memory fail, but at least the system *knew* the
memory failed, instead of us finding corrupt data later!

Large amounts of RAM on a RAID controller contribute little except in larger
enterprise configurations.

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
IDE RAID	Ted Dawson	Asus Motherboards	29	September 21st 04 03:39 AM
P4C800-E Delux: Setting up SATA Drives with RAID	Will	Asus Motherboards	13	July 12th 04 04:33 AM
How to set up RAID 0+1 on P4C800E-DLX MB -using 4 SATA HDD's & 2 ATA133 HHD?	Data Wing	Asus Motherboards	2	June 5th 04 03:47 PM
Gigabyte GA-8KNXP and Promise SX4000 RAID Controller	Old Dude	Gigabyte Motherboards	4	November 12th 03 07:26 PM
P4C800 deluxe mirror raid as storage only OS on SATA drive	Tim	Asus Motherboards	0	July 18th 03 06:37 PM