If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
SCSI RAID problems
Hello SCSI friends,
I am having some strange/worrying problems with my SCSI RAID setup. Perhaps someone can help? I have got a Gigabyte GA-KNXP Ultra m/b with embedded Adaptec AIC-7902 U320 SCSI controller. I have got a RAID 10 array made up of 4 x Maxtor Atlas IV 10k. 36GB disks (SCSI IDs 3, 4, 5 & 6) on Channel A. The 72GB useable space is partitioned into a 4GB user partition + 4 others for paging file, apps & data (+ some free space left over). I run WinXP Pro SP1. A little while ago I started to get signs of hardware problems:- a.. Errors in the Windows Event Log (ID 9) saying "The device, \Device\Scsi\a320raid1, did not respond within the timeout period." A few times the system crashed about the same time; b.. Nasty "scrunching" sounds coming from the disks; c.. In the SCSI BIOS the array is tagged "Degraded". On drilling down, the disk on ID 4 is tagged "Degraded", all the others are "Optimal". After a short while the errors & noise go away, so I assume the disk has finally died & prepare to RMA it. In the interim I get a different error in the event log (ID 7) saying "The device, \Device\Harddisk0\D, has a bad block." whenever (and only when) I try to backup drive C: to the Tandberg SLR7 tape drive using the WinXP Backup utility. (I get error ID 14 saying "The shadow copy of volume C: was aborted because of an IO failure." at the same time. I can't find out much about these errors, other than that he first one usually means disk failure is imminent; however as the O/S can only see the logical disk, not the individual physical devices, I put this down to a glitch from the disk failure. Apart from this, all apps appear to work normally as far as I can tell. When I get the replacement disk I replace the "Degraded" one using the same SCSI ID & attempt a rebuild. However this fails "Dur ro read error on device ID 3". When I run Verify Media on disk ID 3 it finds 3 bad blocks; it says it has remapped these, but on re-running both the array rebuild & the verify media come up with the same errors. I put the "old" disk on Channel B, expecting it to be dead; however it appears in the BIOS, and when I run verify media it comes up with 1 bad block. Again, remapping fails to fix it, but after a low-level format verify media finds no errors. So the disk I originally thought was dead now appears to be as good as new (so too presumably is the replacement), however the *real* problem seems to lie with disk ID 3, even though the BIOS says it is optimal - and I can't rebuild the array. As the array is now running on only 3 disks out of 4 I am reluctant to do anything with ID 3 that might corrupt the array. Do I have any alternative other than to rebuild the array & reinstall everything? Fortunately I can still make backups of all partitions *except* C:, and I guess I can save anything I need from there on CR-R, but it is still a real pain... Anyone have any ideas what is really going on, & how I can recover it? And how many, if any, dud disks do I have? The noises when the problem originally manifested definitely sounded very physical! Thanks in advance for all help & advice. Andrew |
#2
|
|||
|
|||
Are the disks hot?
I was getting scsi disk errors due to the 4 disks that I had configured where getting to hot and started to play up. I have since added additional fans and now the disks are easy to touch and handle but before I added the fans I could hardly touch them. "Andrew Wasielewski" wrote in message ... Hello SCSI friends, I am having some strange/worrying problems with my SCSI RAID setup. Perhaps someone can help? I have got a Gigabyte GA-KNXP Ultra m/b with embedded Adaptec AIC-7902 U320 SCSI controller. I have got a RAID 10 array made up of 4 x Maxtor Atlas IV 10k. 36GB disks (SCSI IDs 3, 4, 5 & 6) on Channel A. The 72GB useable space is partitioned into a 4GB user partition + 4 others for paging file, apps & data (+ some free space left over). I run WinXP Pro SP1. A little while ago I started to get signs of hardware problems:- a.. Errors in the Windows Event Log (ID 9) saying "The device, \Device\Scsi\a320raid1, did not respond within the timeout period." A few times the system crashed about the same time; b.. Nasty "scrunching" sounds coming from the disks; c.. In the SCSI BIOS the array is tagged "Degraded". On drilling down, the disk on ID 4 is tagged "Degraded", all the others are "Optimal". After a short while the errors & noise go away, so I assume the disk has finally died & prepare to RMA it. In the interim I get a different error in the event log (ID 7) saying "The device, \Device\Harddisk0\D, has a bad block." whenever (and only when) I try to backup drive C: to the Tandberg SLR7 tape drive using the WinXP Backup utility. (I get error ID 14 saying "The shadow copy of volume C: was aborted because of an IO failure." at the same time. I can't find out much about these errors, other than that he first one usually means disk failure is imminent; however as the O/S can only see the logical disk, not the individual physical devices, I put this down to a glitch from the disk failure. Apart from this, all apps appear to work normally as far as I can tell. When I get the replacement disk I replace the "Degraded" one using the same SCSI ID & attempt a rebuild. However this fails "Dur ro read error on device ID 3". When I run Verify Media on disk ID 3 it finds 3 bad blocks; it says it has remapped these, but on re-running both the array rebuild & the verify media come up with the same errors. I put the "old" disk on Channel B, expecting it to be dead; however it appears in the BIOS, and when I run verify media it comes up with 1 bad block. Again, remapping fails to fix it, but after a low-level format verify media finds no errors. So the disk I originally thought was dead now appears to be as good as new (so too presumably is the replacement), however the *real* problem seems to lie with disk ID 3, even though the BIOS says it is optimal - and I can't rebuild the array. As the array is now running on only 3 disks out of 4 I am reluctant to do anything with ID 3 that might corrupt the array. Do I have any alternative other than to rebuild the array & reinstall everything? Fortunately I can still make backups of all partitions *except* C:, and I guess I can save anything I need from there on CR-R, but it is still a real pain... Anyone have any ideas what is really going on, & how I can recover it? And how many, if any, dud disks do I have? The noises when the problem originally manifested definitely sounded very physical! Thanks in advance for all help & advice. Andrew |
#3
|
|||
|
|||
when you say you have a Raid 10 array setup ... do you mean a raid 1/0? ...
the only raid setups I've come across are either Raid0, Raid1, Raid1/0 or Raid5. It sounds like you have a Raid 0/1 array from the drives you have and the space available (Raid5 would give you 108Gb -- 3x36Gb + 4th drive online spare) ... how is the mirroring setup? ... if the drives on scsi id's 5&6 are a mirror of drives 3&4 then you should be able to restore from them. If that's the case you could try rebuilding the array using your original drive 4 and the replacement while keeping drive 3 intact let us know how you get on Mal |
#4
|
|||
|
|||
Disks don't feel noticably warm, let alone hot. Case is a Lian-Li PC-71 with plenty of fans, so don't think that can be it...
"Anthony Preston" wrote in message ... Are the disks hot? I was getting scsi disk errors due to the 4 disks that I had configured where getting to hot and started to play up. I have since added additional fans and now the disks are easy to touch and handle but before I added the fans I could hardly touch them. "Andrew Wasielewski" wrote in message ... Hello SCSI friends, I am having some strange/worrying problems with my SCSI RAID setup. Perhaps someone can help? I have got a Gigabyte GA-KNXP Ultra m/b with embedded Adaptec AIC-7902 U320 SCSI controller. I have got a RAID 10 array made up of 4 x Maxtor Atlas IV 10k. 36GB disks (SCSI IDs 3, 4, 5 & 6) on Channel A. The 72GB useable space is partitioned into a 4GB user partition + 4 others for paging file, apps & data (+ some free space left over). I run WinXP Pro SP1. A little while ago I started to get signs of hardware problems:- a.. Errors in the Windows Event Log (ID 9) saying "The device, \Device\Scsi\a320raid1, did not respond within the timeout period." A few times the system crashed about the same time; b.. Nasty "scrunching" sounds coming from the disks; c.. In the SCSI BIOS the array is tagged "Degraded". On drilling down, the disk on ID 4 is tagged "Degraded", all the others are "Optimal". After a short while the errors & noise go away, so I assume the disk has finally died & prepare to RMA it. In the interim I get a different error in the event log (ID 7) saying "The device, \Device\Harddisk0\D, has a bad block." whenever (and only when) I try to backup drive C: to the Tandberg SLR7 tape drive using the WinXP Backup utility. (I get error ID 14 saying "The shadow copy of volume C: was aborted because of an IO failure." at the same time. I can't find out much about these errors, other than that he first one usually means disk failure is imminent; however as the O/S can only see the logical disk, not the individual physical devices, I put this down to a glitch from the disk failure. Apart from this, all apps appear to work normally as far as I can tell. When I get the replacement disk I replace the "Degraded" one using the same SCSI ID & attempt a rebuild. However this fails "Dur ro read error on device ID 3". When I run Verify Media on disk ID 3 it finds 3 bad blocks; it says it has remapped these, but on re-running both the array rebuild & the verify media come up with the same errors. I put the "old" disk on Channel B, expecting it to be dead; however it appears in the BIOS, and when I run verify media it comes up with 1 bad block. Again, remapping fails to fix it, but after a low-level format verify media finds no errors. So the disk I originally thought was dead now appears to be as good as new (so too presumably is the replacement), however the *real* problem seems to lie with disk ID 3, even though the BIOS says it is optimal - and I can't rebuild the array. As the array is now running on only 3 disks out of 4 I am reluctant to do anything with ID 3 that might corrupt the array. Do I have any alternative other than to rebuild the array & reinstall everything? Fortunately I can still make backups of all partitions *except* C:, and I guess I can save anything I need from there on CR-R, but it is still a real pain... Anyone have any ideas what is really going on, & how I can recover it? And how many, if any, dud disks do I have? The noises when the problem originally manifested definitely sounded very physical! Thanks in advance for all help & advice. Andrew |
#5
|
|||
|
|||
RAID 0+1 is another name for my setup i.e. striping + mirroring. Since the
4 disks are identical I presume there is a 1-to-1 correspondence between the the blocks on one side of the mirror and the other, as the mirroring logic doesn't care whether & how they are striped. However in that case I don't know how the disks are paired off. There wasn't anywhere to specify it in the array setup in the SCSI BIOS, & I can't see anything that displays it. I am wary about finding out the hard way by disconnecting disk ID 3, as if that turns out to be the currently non-redundent disk I don't want to risk corrupting the array irretrievably. Or will it simply fail to recognise the array at all in that case, until I put back the missing disk? "Mal" wrote in message ... when you say you have a Raid 10 array setup ... do you mean a raid 1/0? .... the only raid setups I've come across are either Raid0, Raid1, Raid1/0 or Raid5. It sounds like you have a Raid 0/1 array from the drives you have and the space available (Raid5 would give you 108Gb -- 3x36Gb + 4th drive online spare) ... how is the mirroring setup? ... if the drives on scsi id's 5&6 are a mirror of drives 3&4 then you should be able to restore from them. If that's the case you could try rebuilding the array using your original drive 4 and the replacement while keeping drive 3 intact let us know how you get on Mal |
#6
|
|||
|
|||
Hi,
I can't see the original post for this due to ISP clobbering the newsgroup.... What type of raid controller is it? - Tim "Andrew Wasielewski" wrote in message ... RAID 0+1 is another name for my setup i.e. striping + mirroring. Since the 4 disks are identical I presume there is a 1-to-1 correspondence between the the blocks on one side of the mirror and the other, as the mirroring logic doesn't care whether & how they are striped. However in that case I don't know how the disks are paired off. There wasn't anywhere to specify it in the array setup in the SCSI BIOS, & I can't see anything that displays it. I am wary about finding out the hard way by disconnecting disk ID 3, as if that turns out to be the currently non-redundent disk I don't want to risk corrupting the array irretrievably. Or will it simply fail to recognise the array at all in that case, until I put back the missing disk? "Mal" wrote in message ... when you say you have a Raid 10 array setup ... do you mean a raid 1/0? ... the only raid setups I've come across are either Raid0, Raid1, Raid1/0 or Raid5. It sounds like you have a Raid 0/1 array from the drives you have and the space available (Raid5 would give you 108Gb -- 3x36Gb + 4th drive online spare) ... how is the mirroring setup? ... if the drives on scsi id's 5&6 are a mirror of drives 3&4 then you should be able to restore from them. If that's the case you could try rebuilding the array using your original drive 4 and the replacement while keeping drive 3 intact let us know how you get on Mal |
#7
|
|||
|
|||
In article , Andrew Wasielewski wrote:
I am having some strange/worrying problems with my SCSI RAID setup. = Perhaps someone can help? I have got a Gigabyte GA-KNXP Ultra m/b with embedded Adaptec AIC-7902 = U320 SCSI controller. I have got a RAID 10 array made up of 4 x Maxtor = Atlas IV 10k. 36GB disks (SCSI IDs 3, 4, 5 & 6) on Channel A. The 72GB = useable space is partitioned into a 4GB user partition + 4 others for = paging file, apps & data (+ some free space left over). I run WinXP Pro = SP1. When you're having mystifyng problems, try looking at heat and power .... That's a lot of drives and if you have a lot of other stuff, perhaps your power supply can't handle it? That can cause all manner of weirdness. Are they too hot? What's the temp sensor on the drives say? -- _ _ _ _ _ _ _ _ _ _ _ _ _ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ ( t | i | m | @ | i | t | . | k | p | t | . | c | c ) \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ \_/ GPG key fingerprint = 1DEE CD9B 4808 F608 FBBF DC21 2807 D7D3 09CA 85BF |
#8
|
|||
|
|||
"Tim" wrote in message Hi, I can't see the original post for this due to ISP clobbering the newsgroup.... Some ISPs filter HTML posts. You can find the message-ID in the header. Cut and paste it to your address bar and type "news:" in front of it. Or go to the top most message and click the attribution line. What type of raid controller is it? - Tim "Andrew Wasielewski" wrote in message ... [snip] |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HELP: P4C800-E Deluxe, Intel RAID and Windows detection problems | Michail Pappas | Asus Motherboards | 2 | November 20th 04 03:18 AM |
IDE RAID | Ted Dawson | Asus Motherboards | 29 | September 21st 04 03:39 AM |
P4C800-E Delux: Setting up SATA Drives with RAID | Will | Asus Motherboards | 13 | July 12th 04 04:33 AM |
RAID 0 problems | no spam | Homebuilt PC's | 0 | April 30th 04 06:18 PM |
GA-8KNXP, how to configure BIOS for SATA? | John Ward | Gigabyte Motherboards | 20 | October 6th 03 07:42 AM |