If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#11
|
|||
|
|||
In article , Nut Cracker wrote:
memtest86 ... you boot from the floppy, fire up the program, and come back in 2 or 3 days when its finished it testing loops. Egads.. I guess my mail will be out of commission for a while (I use this box as my mail server+db+apache/php, etc) while this test runs.. As a side note, this is a great test to run on my parents new Dell PC we just got them thats sitting in my garage waiting to be configured.. (8- This way I'll be more comfortable knowing it's in good working order before sending it off to them in the mail.. Cool.. I'll see about starting it tonight and let you know what happens.. -- Rick |
#12
|
|||
|
|||
In article , Rick F. wrote:
In article , Nut Cracker wrote: memtest86 ... you boot from the floppy, fire up the program, and come back in 2 or 3 days when its finished it testing loops. Cool.. I'll see about starting it tonight and let you know what happens.. Ok.. I let it run for about 2 1/2 days and it got 19 passes in the utility with NO errors detected. None.. Zero.. Nada.. Oh well.. Now, one thing that I'm wondering about that I really hadn't given much thought to at the time.. I had done a bunch of e2fsck's on my root partition (/dev/ida/c0d0p4) and a number of problems were found at the time and it didn't seem to matter how many times I rebooted, ran e2fsck, rebooted again, etc. I always seemed to get the same errors.. I *was* thinking that I might have some bad blocks on the root partition and this evening after rebooting from the memory tester, I booted into emergency mode and ran e2fsck -fc /dev/ida/c0d0p4 and it didn't find anything.. Go figure.. Anyway, I've got it back up and running and did an fsck on all partitions but nothing turned up.. I'll see how long it runs before it reboots again.. -- Rick |
#13
|
|||
|
|||
Rick F. wrote: Now, one thing that I'm wondering about that I really hadn't given much thought to at the time.. I had done a bunch of e2fsck's on my root partition (/dev/ida/c0d0p4) and a number of problems were found at the time and it didn't seem to matter how many times I rebooted, ran e2fsck, rebooted again, etc. I always seemed to get the same errors.. I *was* thinking that I might have some bad blocks on the root partition and this evening after rebooting from the memory tester, I booted into emergency mode and ran e2fsck -fc /dev/ida/c0d0p4 and it didn't find anything.. Go figure.. Anyway, I've got it back up and running and did an fsck on all partitions but nothing turned up.. I'll see how long it runs before it reboots again.. I don't believe that Linux will ever do a spontaneous reboot with no error message in response to a disk error. We are looking for something that kicks the legs out from under the OS before it can react - which tends to point to CPU/memory/motherboard/OS. I would try the following next: Get the latest Knoppix CD, boot it up, open up as many programs as you can, and let it sit. If you get no reboots, you know that it isn't a CPU/memory/motherboard problem - that would hit Knoppix. If you still get reboots, you know that it isn't the hard disk or Fedora that is causing the reboots. Try running on just the left power supply for a few days, then just the right. If you still get reboots in both cases, it isn't a power supply problem. Try running on half of your ram for a few days, then on the other half. If you still get reboots in both cases, it isn't a ram supply problem. Try running on half of your processors for a few days, then on the other half. If you still get reboots in both cases, it isn't a CPU problem. As a last resort, you can look for a really cheap 5500 on eBay (no memory, no disks, single low-Mhz CPU units are really quite cheap) - ideally with local pickup - and start transferring parts over one-by-one with testing at each stage. Don't forget to post full details when you solve this so the next fellow will know what to look for. |
#14
|
|||
|
|||
In article , Guy Macon wrote:
I don't believe that Linux will ever do a spontaneous reboot with no error message in response to a disk error. We are looking for something that kicks the legs out from under the OS before it can react - which tends to point to CPU/memory/motherboard/OS. Hmm.. An interesting tidbit of info.. On Saturday I shutdown any unnecessary daemons so the OS itself is pretty much the only thing running and the machine has now been up for : uptime 16:43:14 up 2 days, 13:17, 3 users, load average: 0.08, 0.02, 0.01 IF this continues (no reboots) for the next few days, that will tell me one (or more) of the following may be the case : o Little system activity = no stress to tickle the underlying problem o One of the daemons I killed is one of the culprits o all of the above For the record, the daemons I disabled are : 1) Dovecot (imap daemon) 2) Apache/PHP setup for virtual hosting\ 3) MySQL 4.1.11 4) exim 4.51 modified to feed all incoming mail to sa-exim-dspam for realtime spam filtering/rejection -- uses dspam as the backend 5) ncftpd 6) Samba *MOST* of the time when the machine takes a hit, I'm not using it (I'm the only user of it aside from hackers) and it's in the wee hours of the morning. I guess I'd be tempted to point to Apache IF it's determined that one of the daemons is at fault.. The only daemon I've not shutdown/disabled is my sshd which is the only way into the box remotely. Feel free to tell me I'm blowing smoke here or not.. I'd rather not re-install another version of OS if at all possible (e.g. Knoppix as suggested) as it's a pain in the $##$# to get things customized again.. I'll let this server sit around like it is now for a few more days and if no reboots yet then I may decide to re-enable a few of the daemons just to see what happens. -- Rick |
#15
|
|||
|
|||
Rick F. wrote: IF this continues (no reboots) for the next few days, that will tell me one (or more) of the following may be the case : o Little system activity = no stress to tickle the underlying problem I am guessing no, but it's easy to test. Run a prgram that exercises the CPU a lot Seti@home, one that uses up lots of memory (just open a bunch of copies of any convenient file) and one that exercises the disk a lot (are there any Linux defragmenters that are still around?) o One of the daemons I killed is one of the culprits That would be my guess. I'd rather not re-install another version of OS if at all possible (e.g. Knoppix as suggested) (Yoda voice) Learn you must - re-install you must not! Powerful is the slack^h^h^h^h^h^h force in Knoppix- boot it does from CD without changes to hard disk. Unplug the hard disk you can! Trust the force, young Rick; boot Knoppix and see if crashes it does. Safe your Fedora intallation will be. -- Guy Macon http://www.guymacon.com/ |
#16
|
|||
|
|||
"Guy Macon" http://www.guymacon.com/ wrote in message
... Rick F. wrote: Now, one thing that I'm wondering about that I really hadn't given much thought to at the time.. I had done a bunch of e2fsck's on my root partition (/dev/ida/c0d0p4) and a number of problems were found at the time and it didn't seem to matter how many times I rebooted, ran e2fsck, rebooted again, etc. I always seemed to get the same errors.. I *was* thinking that I might have some bad blocks on the root partition and this evening after rebooting from the memory tester, I booted into emergency mode and ran e2fsck -fc /dev/ida/c0d0p4 and it didn't find anything.. Go figure.. Anyway, I've got it back up and running and did an fsck on all partitions but nothing turned up.. I'll see how long it runs before it reboots again.. I don't believe that Linux will ever do a spontaneous reboot with no error message in response to a disk error. We are looking for something that kicks the legs out from under the OS before it can react - which tends to point to CPU/memory/motherboard/OS. I would try the following next: Get the latest Knoppix CD, boot it up, open up as many programs as you can, and let it sit. If you get no reboots, you know that it isn't a CPU/memory/motherboard problem - that would hit Knoppix. If you still get reboots, you know that it isn't the hard disk or Fedora that is causing the reboots. Try running on just the left power supply for a few days, then just the right. If you still get reboots in both cases, it isn't a power supply problem. Try running on half of your ram for a few days, then on the other half. If you still get reboots in both cases, it isn't a ram supply problem. Try running on half of your processors for a few days, then on the other half. If you still get reboots in both cases, it isn't a CPU problem. As a last resort, you can look for a really cheap 5500 on eBay (no memory, no disks, single low-Mhz CPU units are really quite cheap) - ideally with local pickup - and start transferring parts over one-by-one with testing at each stage. Don't forget to post full details when you solve this so the next fellow will know what to look for. Played part of this hand once on a server that would reboot every few hours. Day crew went nuts over it. Sys admin at first thought the original setup was fubar. They checked logs and events, everything and all the found was that load didn't matter, apps, users, OS, etc. They played with it for several weeks without an answer, swore it was a software problem. So they took it off line and passed it to us night guys and told us to play with it and figure out what the problem was. Sr. tech had us strip it to bare bones and start over. By the third night we had the answer, video card had some glitch QA had missed and at random intervals it would kill the system. Only found it by doing it the stupid way, swap a component and run it until it failed or ran 24 hours without a hiccup. If all else fails that might be the route to take. KC |
#17
|
|||
|
|||
In article , Kevin Childers wrote:
Played part of this hand once on a server that would reboot every few hours. Day crew went nuts over it. Sys admin at first thought the original setup was fubar. They checked logs and events, everything and all the found was that load didn't matter, apps, users, OS, etc. They played with it for several weeks without an answer, swore it was a software problem. So they took it off line and passed it to us night guys and told us to play with it and figure out what the problem was. Sr. tech had us strip it to bare bones and start over. By the third night we had the answer, video card had some glitch QA had missed and at random intervals it would kill the system. Only found it by doing it the stupid way, swap a component and run it until it failed or ran 24 hours without a hiccup. If all else fails that might be the route to take. Hmm.. Latest saga on the server.. I went to try to ssh to it this afternoon only to get a timeout from my office.. Doh.. I just fired up the screen only to find the machine had rebooted and got stuck at the following message : 1777-Slot 6 Drive Array - Proliant Storage Enclosure Problem Detected SCSI Port 1 : Interrupt Signal Inoperative - Check SCSI Cables. Ok.. So I rip apart the machine and remove the SCSI enclosure looking for any signs of anything.. Nothing looks amiss. I relocated the SmartArray 3200 card from slot 6 to slot 7, reseat, inspect, etc. No change.. Perhaps this is part of my initial problem that just started to get a hard-manifestation? Comments now? Interestingly enough, it prompts for whether I want to continue or run the setup diagnostics.. I tell it to continue and it boots up fine (except it initially complained about the disks being not shutdown properly, solved by a nice e2fsck). I'm writing this on it as I type.. Any ideas on how to proceed? Could this mean my SmartArray 3200 is on it's way out? Perhaps it's getting to be time to shop over at E*bay for some spares.. -- Rick |
#18
|
|||
|
|||
Rick F. wrote: Hmm.. Latest saga on the server.. I went to try to ssh to it this afternoon only to get a timeout from my office.. Doh.. I just fired up the screen only to find the machine had rebooted and got stuck at the following message : 1777-Slot 6 Drive Array - Proliant Storage Enclosure Problem Detected SCSI Port 1 : Interrupt Signal Inoperative - Check SCSI Cables. Ok.. So I rip apart the machine and remove the SCSI enclosure looking for any signs of anything.. Nothing looks amiss. I relocated the SmartArray 3200 card from slot 6 to slot 7, reseat, inspect, etc. No change.. Perhaps this is part of my initial problem that just started to get a hard-manifestation? Comments now? Interestingly enough, it prompts for whether I want to continue or run the setup diagnostics.. I tell it to continue and it boots up fine (except it initially complained about the disks being not shutdown properly, solved by a nice e2fsck). I'm writing this on it as I type.. Any ideas on how to proceed? Could this mean my SmartArray 3200 is on it's way out? Perhaps it's getting to be time to shop over at E*bay for some spares.. That's a ribbon cable, right? I have seen them go bad from people pulling on the cable to unplug the connector - plus it will be cheap and light (low shipping cost) to get another. Keep a lookout for a server with no RAM, few/slow CPUs and no drives that is available for local pickup. They often go for less than a card in them would sell for... |
#19
|
|||
|
|||
In article , Guy Macon wrote:
That's a ribbon cable, right? I have seen them go bad from people pulling on the cable to unplug the connector - plus it will be cheap and light (low shipping cost) to get another. Hmm.. Is this something I can p/u at my local Fry's or equivalent sort of store -- I *think* it's one of those 68 pin ribbon cables or something around there (haven't counted the contacts).. I just wonder if it was marginally bad and whether or not it's my culprit for the reboots -- particularly if I'm getting occasional odd behavior with one or more signals being messed up on the cable.. Keep a lookout for a server with no RAM, few/slow CPUs and no drives that is available for local pickup. They often go for less than a card in them would sell for... Yeah.. I'll keep my eyes open.. I picked up this server from a guy on my local Craigslist.. Thanks! -- Rick |
#20
|
|||
|
|||
In article , Rick F. wrote:
In article , Guy Macon wrote: That's a ribbon cable, right? I have seen them go bad from people pulling on the cable to unplug the connector - plus it will be cheap and light (low shipping cost) to get another. Hmm.. Is this something I can p/u at my local Fry's or equivalent sort of store -- I *think* it's one of those 68 pin ribbon cables or something around there (haven't counted the contacts).. I just wonder if it was marginally bad and whether or not it's my culprit for the reboots -- particularly if I'm getting occasional odd behavior with one or more signals being messed up on the cable.. Ok.. I picked up a new SCSI cable down at my local Fry's this morning and just plugged it in and no go -- same problem as before.. I guess I'll be keeping my eyes open for a used SmartArray 3200 on E*bay or whatnot.. Is it safe to assume that the cages don't go bad ever or very infrequently? About the only thing I've yet to try is to unplug ALL of my SCSI drives and reseat them in the cage.. I'm leaning towards a bad controller.. -- Rick |
|
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
FS PRINTER PARTS trays fusers drums printheads -- oki fujitsu hp genicom epson ibm dec jetdirect laserjet lexnmark qms okidata ml320 mannesmann tally printonix tektronix qms toshiba zebra otc ibm lexmark intermec dec compaq montreal canada toronto o | [email protected] | Printers | 1 | May 29th 05 07:18 PM |
GW Power Supply Feedback | RDBrimmer | Gateway Computers | 9 | October 22nd 04 07:15 PM |
PSU Fans | Muttly | General | 16 | February 13th 04 10:42 PM |
Compaq proliant 1200 power supply pinout. | Jurgen ten Buuren | General | 0 | December 2nd 03 09:07 PM |
Compaq Proliant 5500 won't power up | Jim Balson | Compaq Computers | 7 | September 19th 03 12:05 AM |