If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#21
|
|||
|
|||
On Tue, 30 Nov 2004 17:42:03 +0100, "Joris Dobbelsteen"
wrote: snip What management, just install the array and you are done. It works just like a normal disk (except for setting up the array once). One of the main points of raid is to be proactive about failures & uptime. So I'm talking about things like being able see SMART/PFA status and have the array move to a hot spare when SMART fails. Providing notification about such an event (like SNMP traps, email, pages, popup). Also initiating a process to validate the data against the ecc data & check the media surface. Being able to reconfigure or upgrade or provide info about the array without taking the whole machine down. Being able to automate & schedule these upgrades or validation checks for a more convenient time (so there is no discernable performance hit). All this may seem like overkill, but it really isn't if you want to get the full benefits of non-zero raid. I don't see RAID0 as a viable choice for most scenarios. I also assume that if you are looking to firmware raid (even ROMB, etc) you have higher expectations than the quick and dirty OS software striped set. With some controllers you might get in trouble when you use different disks, so use the same brand AND model. AND model revisions That's all more of an issue with ATA raid for a number or reasons esp not being as configurable as scsi drives as well as firmware limitations. Recovery: RAID1: turn of the system, remove the defective drive and replace it. Turn on, repair the array, wait a few seconds for the disk copy and done. Well that's fine for the ideal and simplest of scenarios - its not the only one. Todays disks are capable of relocating damaged sectors, Yes but AFAIK not always automatically or at least not always on a strictly hardware/low level. Also it seems to me the ideal is to have a raid controller initiate & manage a continuous background scan of the media and restore data from bad sectors using the redundant data residing on good sectors rather than trying to recover/read from weak sectors. (better ata & scsi controllers do this) they do it all (same reason your 128MB USB drive/memory stick only has 120 MB storage capacity). I don't know much about memory stick architecture. It seems to me this is more a result of file system overhead and perhaps conflicting measurements of raw capacity (like it usually is w' storage) rather than say "reserved space" for some low-level harware-implemented automatic recovery process. Please educate me if wrong. Now the why: Fast access to files, short response times, fast copying - just some luxury issues. Short response times. Cheetah 15K leads, but short response times for what? Hit the nail on the head. The OP has to identify the kinds of tasks which are choking his disk subsystem and use one or a combination of suggestions already mentioned in the thread to open the bottleneck. If you throw too much disk intensive stuff at any storage it will choke regardless of whether it is raptor raid0 on a dedicated bus, large 15k array, or whatever. Dividing the load can often be more important than having the fastest disk or logical disk. Simply call it resource contention. For a single-user system the Raptor will handle the resource contention better than the SCSI system. Of course this is subject to the opinion expressed by a third party, who may resonabily be expected to have sufficient knowledge of the system to provide such an 'opinion'. Could you direct us to this resource? AFAIK SATA has the scsi protocol on top of ata - so it has greater overhead. If there is 1 disk there isn't really any contention for that resource in either case. Because SATA is point to point there is never drive arbitration per bus, but in multi-disk sata the overall efficiency depends more on details of the controller(s) than the point to point design per se. Educate me if I'm wrong. Usually response times, throughput and storage capacity requires a trade-off. My trade-off would favor storage capacity over throughput over response times. Then I would expect you to favor the larger 7200 rpm sata drives marketed as "Personal Storage" devices (some are quite good). The "enterprise" raptors are similar to current 10k scsi in multiple regards including price, capacity, & performance (& theoretical MTBF). AFAIK the price "advantage" of raptors has more to do with a comparison with the "need" for a "deluxe" retail boxed Adaptec controller - not really capacity per disk or $/MB (from what I've seen). Throwing ATA raid0 into the mix (as you first suggested it) the trade off variables have to be expanded to include complexity, cost & to some extent reliability. snip Indeed, but if you want luxery, you are (or someone else is if you are lucky) going to pay for it anyway. Its just considering how much you are willing to spend for you luxery. However for the same luxery (or even the same essential product that you simply need) there is a large variation of prices that you can pay. True. But often things seem to be luxury simply because of sticker shock. In some cases when time is very valuable products that bring even small increases in productivity or assurance of quality can bring real value despite this initial sticker shock. This is all really relative though & must be taken case by case. snip Failures? Make backups. You will need them anyways, no matter what you are doing. If this is a mayor concern, two RAID1 raptors have equal costs to a single Cheetah 15K.3 and a much better MTBF (theoratically). Please explain. for arrays basically Array MTBF = Drive MTBF / N Drive (well actuall you're supposed to include the MTBF of the controller, etc wich lowers MTBF further) Lets asume most chip manufacturers (NOT designers, there are only a few manufacturers) are equally capable of making the same quality product. Besides the mechanical parts are more likely to fail than electrical. A very hot CPU would last for 10 years (its designed for it anyway). I expect the same for chipsets. I only saw electronics fail because of ESD, lightning storms and some chemicals (e.g. from batteries). I wouldn't consider the controller to be a major problem with disk subsystems. Outright and complete failure is one thing. Erratic behavior due to poor design, overheating, damage & imminent failure is another. Yes mechanical devices don't last as long as IC's but I think there is a bit more to reliability than time to total failure. Frankly I'm not that worried about an ATA raid controller dying prematurely or before a scsi hba. I'm more concerned about a low-end controller having limitations which interfere with the ability for the raid to reliability deliver on its core features/promises or conflicts or poorly written code which waste the user's/administrator's time & eats away at the assumed cost savings. Array MTBF is significantly lower than a single disk. Raid is supposed to make up for that by providing storage service continuity and enhanced data integrity (in most cases) and other features. When using 2-disk RAID 1 (NOT RAID 0): when 1 disks fails the system continues to operate correctly, leaving you time to replace the defective material with no loss of continuity. One disk systems will stop working when the disk failes. Yes but the array MTBF calculation characterizes arrays as more complex than a single drive with more potential points of failure. Yes when one disk drops off the other keeps going, but there is more to it than that. 1. if both disks are the same age with the same wear they might die at similar times so you might not have as much time as you think to get the replacement. Failure rates occur in a "U" pattern are not linear across time. 2. not all failures are neat and tidy Either not so uncommon cases are real potential time/productivity wasters which can invalidate the expected benefits of non-zero raid. Besides recovery times for RAID1 are probably lower than for one-disk systems. I don't understand. You mean rebuilding the data to a new disk? Than not "probably" -"definitely" because the process is supposed to be seamless as opposed to backup file recovery & bare metal restore & redoing the work since the last backup. The minor performance hit and short time it takes to rebuild in the background is hardly worth even trying to compare or consider (unless you are really concerned about power failure and UPS runtime). Both the cheetah and raptor are rated 1,200,000-hour MTBF (theoretical) so a raid1 or 2 disk raid0 array of either yields 600,000 hours (actually lower when including the other non-drive storage componants). Of course manufacturers provide theoretical MTBF not operational MTBF and MTBF never actually characterizes a particular disk and should be taken with a grain of salt... Basically under normal operations the system will continue to work without failing once. But that's a pretty crude comparison of reliability. That assertion also depends on a lot of things. Also you can't really say: 1. both are equally reliable because both are reliable enough to usually work during a normal service life. and at the same time say: 2. "two RAID1 raptors have equal costs to a single Cheetah 15K.3 and a much better MTBF (theoretically)" without the two statements either being contradictorily or virtually valueless. Certainly I don't yet see the second statement as being proven, explained, or correct. The MTBF calculation I cited highlights the added complexity and potential points of failure that raid brings and that is normally interpreted as an array being "theoretically less reliable" than a single disk. That being said a properly implemented non-zero RAID "should" yield "more reliable storage" but that has more to do with doing the work and making the investment to "dot your i's" and "cross your t's" than any theoretical calculation or characterization of _all_ or _any_ non-zero raid. It's very easy to botch a raid implementation and end up with storage that is more expensive, more work, and less reliable over normal disks. Storage & technical discussion groups regularly have "help I didn't do a backup and I can't get my ultra-cheap raid back online" posts. Once in a while you also see "Boy! I just found out the hard way that most raid 5's are susceptible to transient write errors" posts. These users were not 100% protected just because they got disks to work together and generate ecc data. You are probably have more problems with software than you will have with hardware. Most down-time is either human or software related, not hardware. Yes. However it is not so uncommon for "minor" hardware error with "working" devices. When you don't invest enough time to really scientifically dissect & troubleshoot these issues they appear as software problems when they are not or are simply unsolved & forgotten about or ignored because you were lucky that it didn't affect anything that important. The issue is that when its hardware related, recovery costs much more time and you have a bigger risk of losing valuable data. When the disk will start to fail, it will probably be obsolute anyways, Often yes. My preoccupation with robust data integrity features in raid (here and elsewhere) has to do with transient error and failing but still spinning media or power failure which just shouldn't ever get the chance to crap on anything. If you have non-zero raid something is very wrong if you _ever_ have to go to a backup or "reinstall" or "rollback" or troubleshoot due to any kind of storage HW issue. Without significant gains it's hard to justify the additional expense, effort, or system complexity. unless your system lasts for more than 6 years. Of course, when you want this, you should rather prepare for the worst and have a 4-computer clustered installed with fail-over capability. Well if you are going to exceed the service life so dramatically you are not exactly "wearing a belt & suspenders" no matter how much the $$$ investment. All that redundancy is good to ensure uptime for a "normal period" but is not necessarily a great tool for trying to drain the last drop of blood out of antiquated and worn out HW because of the overhead and expense. Asuming its for luxery and I have here a system that is in operation for already 5 years and is subject to frequent transports and some very disk-intensive work at times, it never left me alone due to a hardware failure (the normal minor stuff because I forgot some cables or didn't attach them too well provided). All the products I used where the cheapest compared to competitors, however some trades where made between brands when I think for only a very small difference I could get something I expect to be more reliable or better. acerbic comment A 5-year-old machine is hardly a "luxury." /acerbic comment I have a few machines like that and nearly twice as old still running and in service with mostly original parts (now doing very limited tasks of course). It's exactly those machines that impressed upon me some time ago that just because its "up" and "seems OK" doesn't necessarily mean you can really depend on it 100%. Also timesavings and confidence in HW really go a long way and greatly offset some "sticker shock" expenses or at least regular upgrades/decommissions. I've been finding it MUCH cheaper to replace these "working," "capable" machines than to try to continue to plug-along with "old," or "cheap" HW for a variety or reasons including reliability. Of course I may just be too picky and fortunate. |
#22
|
|||
|
|||
On Mon, 06 Dec 2004 07:14:17 GMT, Curious George
wrote: On Tue, 30 Nov 2004 17:42:03 +0100, "Joris Dobbelsteen" wrote: snip What management, just install the array and you are done. It works just like a normal disk (except for setting up the array once). One of the main points of raid is to be proactive about failures & uptime. So I'm talking about things like being able see SMART/PFA status and have the array move to a hot spare when SMART fails. Providing notification about such an event (like SNMP traps, email, pages, popup). Also initiating a process to validate the data against the ecc data & check the media surface. Being able to reconfigure or upgrade or provide info about the array without taking the whole machine down. Being able to automate & schedule these upgrades or validation checks for a more convenient time (so there is no discernable performance hit). All this may seem like overkill, but it really isn't if you want to get the full benefits of non-zero raid. I don't see RAID0 as a viable choice for most scenarios. I also assume that if you are looking to firmware raid (even ROMB, etc) you have higher expectations than the quick and dirty OS software striped set. With some controllers you might get in trouble when you use different disks, so use the same brand AND model. AND model revisions That's all more of an issue with ATA raid for a number or reasons esp not being as configurable as scsi drives as well as firmware limitations. ??? Where do you get this stuff? It is very rare for ATA raid to need same size, model, make, OR model revisions of drives. You can use any of the most common ATA RAID controllers and plug in (just about anything), keeping in mind that the smallest drive capacity will be a limit, but not an "issue" per se. snip But often things seem to be luxury simply because of sticker shock. In some cases when time is very valuable products that bring even small increases in productivity or assurance of quality can bring real value despite this initial sticker shock. This is all really relative though & must be taken case by case. This is absolutely backwards. You write "cases when time is very valuable", but everything you'd mentioned is the most time consuming, largest burden on administrator, and largest wear on the drives doing that background scanning. It is not "small increases in productivity or assurance of quality". You're still confused about that. It is not more productive to toy around instead of just setting it up and being done. There is no "assurance of quality", rather so much more than can go wrong, and if you actually feel those features you mentioned are needed, then I suggest that your proposed solution should be AVOIDED LIKE THE PLAGUE, because it seems the most problematic thing a company could ever dump in a trash bin. Outright and complete failure is one thing. Erratic behavior due to poor design, overheating, damage & imminent failure is another. Yes mechanical devices don't last as long as IC's but I think there is a bit more to reliability than time to total failure. So your proposed solution is going to fail first then. It runs hotter, is much more complex and grafted together with more to go wrong, and the expense makes it less likely that there will be spare controllers and/or secondary systems online already. Frankly I'm not that worried about an ATA raid controller dying prematurely or before a scsi hba. I'm more concerned about a low-end controller having limitations which interfere with the ability for the raid to reliability deliver on its core features/promises or conflicts or poorly written code which waste the user's/administrator's time & eats away at the assumed cost savings. ROFLOL You've gone on and on about features of your proposed solution that eat away time, apparently it not only costs a lot more to purchase but to administer as well. In the end I suspect it'll cost about 8X as much including labor. Array MTBF is significantly lower than a single disk. Raid is supposed to make up for that by providing storage service continuity and enhanced data integrity (in most cases) and other features. When using 2-disk RAID 1 (NOT RAID 0): when 1 disks fails the system continues to operate correctly, leaving you time to replace the defective material with no loss of continuity. One disk systems will stop working when the disk failes. Yes but the array MTBF calculation characterizes arrays as more complex than a single drive with more potential points of failure. Yes when one disk drops off the other keeps going, but there is more to it than that. 1. if both disks are the same age with the same wear they might die at similar times so you might not have as much time as you think to get the replacement. Failure rates occur in a "U" pattern are not linear across time. 2. not all failures are neat and tidy Either not so uncommon cases are real potential time/productivity wasters which can invalidate the expected benefits of non-zero raid. So then the ideal solution is to minimize unnecessary costs on single-points such that there is more redundancy and more frequent disk replacement, not to pour as much $$$$ into it as possible and claim features will save you. snip I have a few machines like that and nearly twice as old still running and in service with mostly original parts (now doing very limited tasks of course). It's exactly those machines that impressed upon me some time ago that just because its "up" and "seems OK" doesn't necessarily mean you can really depend on it 100%. Also timesavings and confidence in HW really go a long way and greatly offset some "sticker shock" expenses or at least regular upgrades/decommissions. I've been finding it MUCH cheaper to replace these "working," "capable" machines than to try to continue to plug-along with "old," or "cheap" HW for a variety or reasons including reliability. THATS JUST IT. You've keep using unreliable boxes because you're deluded into thinking their replacements need to cost multiplie times as much as they really do. It is ludicrous to talk about reliability and 5-10 year old hardware and "sticker shock" as they relate to replacement. All you ever had to do was stop wasting $$$ and replace the boxes more often, replace the disks as often, and make the backups. |
#23
|
|||
|
|||
On Mon, 06 Dec 2004 08:39:24 GMT, kony wrote:
THATS JUST IT. No that isn't just it. You're deluding yourself into thinking that because you are not critical of your hardware it is good enough for everybody and in all environments with all tolerances for risk. You're also missing the opportunity & value of being proactive about storage health raid provides. Making assumptions about hardware is not generally best practice and not at all appropriate in some circumstances. You're still not reading critically what I'm writing and drawing incorrect conclusions based on your own uncritical assumptions. |
#24
|
|||
|
|||
On Mon, 06 Dec 2004 19:10:08 GMT, Curious George
wrote: On Mon, 06 Dec 2004 08:39:24 GMT, kony wrote: THATS JUST IT. No that isn't just it. You're deluding yourself into thinking that because you are not critical of your hardware it is good enough for everybody and in all environments with all tolerances for risk. You're also missing the opportunity & value of being proactive about storage health raid provides. Making assumptions about hardware is not generally best practice and not at all appropriate in some circumstances. You're still not reading critically what I'm writing and drawing incorrect conclusions based on your own uncritical assumptions. I'm more critical than you are apparently, because given any particular budget I'd put more into redundancy, more discs, more regular rotations, and if the budget allowed it, an entire 2nd redundant system. Your delusions about the band-aid you promote as a feel-good solution won't save your bacon when there's a failure... it'll just cost more after the failure as it did beforehand. |
#25
|
|||
|
|||
On Mon, 06 Dec 2004 21:19:18 GMT, kony wrote:
On Mon, 06 Dec 2004 19:10:08 GMT, Curious George wrote: On Mon, 06 Dec 2004 08:39:24 GMT, kony wrote: THATS JUST IT. No that isn't just it. You're deluding yourself into thinking that because you are not critical of your hardware it is good enough for everybody and in all environments with all tolerances for risk. You're also missing the opportunity & value of being proactive about storage health raid provides. Making assumptions about hardware is not generally best practice and not at all appropriate in some circumstances. You're still not reading critically what I'm writing and drawing incorrect conclusions based on your own uncritical assumptions. I'm more critical than you are apparently, ROTFLOL :0 because given any particular budget I'd put more into redundancy, more discs, more regular rotations, and if the budget allowed it, an entire 2nd redundant system. I love how you always interject "oranges" to argue against "apples." That's an availability solution not really a "data integrity" protection one per se. You argue that: - More competent & intelligent & automatic raid is too much work & cost so the answer is setting up & administering a cluster! - Robust automated self monitoring & correction with automated notification is too much work & cost so instead you should buy lots or hardware and rotate it manually! - Budget should not be spent on technical staff or good hardware so if you have money buy more cheap systems that need to be set up and maintained and bring more points of failure, lower MTBF, etc. to the setup as a whole! Kinda sad really if you don't know the difference between a machine doing work and a person. I don't know how you can feel you are a raid expert with such a backwards approach to controlling downtime. Your delusions about the band-aid you promote as a feel-good solution won't save your bacon when there's a failure... it'll just cost more after the failure as it did beforehand. No you're still talking out of your arse. For example, the management features & data integrity features I'm talking about are all fully automatic. They are not band-aids - they are the real deal. A band-aid is when neither you nor the system have any idea what the disks are doing and so you just replace them or rotate them often (manually). All I'm proposing involves a decent controller and an extra minute or two after array creation to configure. Then you never touch it unless there is an already automatically corrected problem or a warning about a projected potential problem. The idea that that involves significantly greater labor or time or administration or cost down the line than flying blind with a system you don't know how well it may deal with abberation is ludicrous. It's one of many examples of how you just can't stop yammering about things you have no experience with, no curiosity to look it up, or to listen when someone talks. If you're really concerned about "saving your bacon" than data integrity is just as important as availability. Actually more because media can degrade & writes fail more than drives & frankly you shouldn't be punished because of UPS runtime/powerfail, etc. Data integrity is the real paydirt in ROI on raid protection in most installations. Also if you are really worried about administration costs & time then you have to go easy with the manual rotations, complex redundant systems & frequent upgrades/replacements - & instead need to increase the simpler, more thorough, more intelligent, more durable automated solutions. Also the suggestion that clustered computers or clustered arrays don't use/need controllers with the features I'm discussing is flat wrong. I don't even think you are clear on what clustering accomplishes & how it works. & idea that things like "Patrol Read" or equivalent is going to make the drives die prematurely is just laughable. To claim that making sure something delivers exactly what it's supposed to is some how bad or a band-aid or strictly feel good effort is totally absurd. It's how smart ppl make purchase & integration decisions for solid systems. To claim that you know how _my_ machines run or when or how I learned what I did is just silly invention. That is one of many other points you distort through wrong assumptions. The older machines I mentioned were cited simply as an example to counter a relaxed attitude towards older & cheaper machines. In fact they _always_ displayed minor stability problems (not related to age) but have been kept around to understand non-standard or suboptimal behavior. They are simply several of many long term experiments which have revealed some surprising results I'm not going to share with you. If you knew these results or the experiments or were not overly eager to make assumptions you would interpret that last paragraph I wrote totally differently. You may think I'm overstating the importance of these protection features & indeed they are not a requirement for every machine/context. But their need _is_ relative to load & tolerance of risk. If you took a strict "risk management" or ROI approach to redundant technologies you would have a different perspective then your "get by with base minimum" or if you have the money "throw lots of stuff at a problem" philosophies. Your most basic needs/wants/understanding are not everybody else's. You act like I'm claiming robust raid is the ONLY way to go in ALL systems. That's just not true. Just look at my other posts in this thread. But for heavens sake, if you're going to do raid, and you are looking for something more than a simple software volume - then don't piddle around half-assed - do it & understand _all_ the benefits _&_ tradeoffs & squeeze all the protection & performance you can out of it! I don't see what's so bad about that - at least you haven't said anything compelling in all these threads against what seems to me as a reasonable perspective/philosophy. If you're just used to coasting through life and getting by with the base minimum half-assed - well then I can see how you have a problem with this. You keep claiming extreme cost but this is a delusion. Elsewhere the comparison was ata vs scsi hw raid. Really if you compare apples to apples i.e. Raptors w' good LSI or 3ware card(s) vs 10k scsi w' similar spec/capacity card & hot-swap to hot-swap or cold-swap to cold-swap, there is either not much or any price difference. You also don't understand how these advanced features don't require you to hold the machine's hand through constant human administration and how well they accomplish risk avoidance which is, after all, the main point & benefit of non-zero raid. I know you know everything. But then its easy to know everything when its based on invention, misunderstanding, and false assumptions about things you have no experience with or haven't examined critically. I don't have the time or inclination to continue this stupid back & forth with you about storage you have no experience with, knowledge about, or of issues you have no appreciation of. Your fantasy drivel/troll is really too absurd to continue to acknowledge. Humoring you has really been a total waste of time. Your raid posts are little more than nuisance & troll. I don't know why or how you feel qualified to yammer on like you do and how you aren't embarrassed by many of the things you say. - Hey, more power to ya. Arrogance goes a long way in this world & ignorance is indeed bliss. |
#26
|
|||
|
|||
On Tue, 07 Dec 2004 03:22:06 GMT, Curious George
wrote: I'm more critical than you are apparently, ROTFLOL :0 snip How else could one put it? You'd have us pay premium then STILL be susceptible to many single-point failures, anything not covered under your sales-pitch for the feature set. Thank me for snipping out the ludicrous arguement you were starting to make for redundancy being bad because of more failure points! With that kind of flawed thinking who would ever make backups of ANYTHING... since the backup IS a redundancy, yet another failure point? Basically you're just a shill. |
#27
|
|||
|
|||
Kony, George, take my advice.
This thread isn't going to stop until only one is left alone on the discussion. Its not getting to any conclusion or anything. Just let it die... We are simply getting nowhere... - Joris |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Best drive configuration? | Noozer | General | 20 | May 27th 04 03:10 AM |
RAID card for my PC?? | TANKIE | General | 5 | May 22nd 04 01:09 AM |
Adding IDE drive to SCSI system | thinman | General | 7 | May 15th 04 01:57 PM |
Axis Storpoint CD and CD/T upgrade to SCSI Disk Drives | Mad Diver | General | 0 | December 31st 03 07:07 PM |
SCSI trouble | Alien Zord | General | 1 | June 25th 03 03:08 AM |