A Whole Hornets Nest of Crap

or Tim Learns Four Lessons

Holy crap was THAT a huge pain in the ass.

I had a couple of hours this morning, so I thought I’d do some preliminary testing before doing the big hardware swap I had planned for this weekend.

Boy did that turn into a FUM (fucked up mess.)

My main server was an IBM Intellistation with dual Pentium III 550MHz CPUs and a gig of RAM. This is a fine box, but I had gotten a 3RU rack mount case and Tyan S1832DL motherboard from Nate and I wanted to move my system to that platform. This is the same motherboard that I toasted the BIOS on not long ago.

The motherboard came with a pair of 400MHz Pentium IIs on it, but the manual from Tyan said it would support the 550MHz PIIIs, so I figured I was golden.

Lesson number one: Never trust the manual.

The only thing I was unsure of was whether the RAM from the IBM would work on this motherboard.

So this morning I powered down the IBM, dragged it upstairs and started swapping hardware.

First I swapped in the RAM and powered it up. All good! It counted up the gig and booted a floppy.

I figured I’d test the CPUs, so I swapped them in and set the jumpers. Boot it up and it reports 550MHz PIIIs and a gig of RAM.

I figured I was good to go.

Lesson number two: Don’t get cocky.

The heatsinks on the PIIIs were too tall, and had no fans, so I figured I’d have to back out the change until I could get some new ones for them. I even found some heatsinks on eBay and bought them (around $18 for the pair, not too bad.)

So I put the RAM and PIII CPUs back into the IBM case and turned it on.

I got nothing. No video. No beeps. Nada. Zero. Zilch.

Whiskey-Tango-Foxtrot?

I swapped the RAM and PIII CPUs back into the new motherboard. It booted.

I grabbed the hard drive out of the IBM and plugged it in to the new motherboard. I booted it up and I got nothing.

So I unplugged the hard drive and turned it back on. Still nothing.

Uh-oh.

Oh shit. Now I’m panicking. The drive is a WD800, just like the drives that I’ve been having trouble with (although it is a lot newer – and under warranty – small consolation if it’s dead.) I have no recent backup, if this drive has gone tits-up I’m totally screwed. I have to admit I started flailing a bit at this point.

Finally I got smart and reset the CMOS on the new motherboard and it booted. But when I plugged in the hard drive, it stopped working again. Unplug the hard drive, reset the CMOS, and it boots. Argh.

So I swapped it all back into the IBM again and reset the CMOS on that motherboard. Reboot it and NOTHING.

Ahhh! WTF?!? This was a working system! All I did was take out the components and put them back in!

Lesson number three: Don’t mess with a working system.

Man I really HATE COMPUTERS.

At this point it was 11:30 and I had to get some lunch into the kids so that we could go to the Guthrie for the matinee of A Christmas Carol. It’s probably a good thing that I had to stop, as I was really frustrated, but I really annoyed me to have to go away and leave my server down.

So we went and watched Scrooge and Tiny Tim. It’s a good show this year.

When I got back I started working though it again.

Finally I discovered that on the new machine I had been using an old-style IDE cable. Apparently the motherboard didn’t like this. As soon as I swapped in a newer IDE cable (ATA100? What are the IDE cables with more, thinner wires called?) it would boot with the PIIs and the hard drive installed.

But no matter what I did, I could not get it to boot with the PIIIs and the hard drive in it.

And no matter what I did, I could not get the IBM to boot again.

So I was comitted to running on the new platform with slower CPUs apparently.

I finally got it to boot up properly with the 400MHz PIIs, the gig of RAM and the hard drive. It booted to the GRUB loader screen and I was happy.

I booted it to the OS and let kudzu discover all the hardware changes – there were quite a few. The IBM had two onboard SCSI controllers and I had added a third. There were two NICs in the IBM, one in the new machine, etc.

Then it came all the way up and I was happy.

Lesson number four: It ain’t over until it’s over.

I thought that maybe I should add the SCSI card to the new machine so I could do some backups to my DLT drive.

I stuck in the card and booted it up. Kudzu detected the new card and configured it. Then it continued to boot and hung with a message about the new SCSI card.

Crap.

I powered it off and booted it again. I answered yes to let it run the consistancy checks on the file systems and then it hung again with the same message.

Well, maybe it’s looking for the tape drive that’s not plugged in. So I took it downstairs to where it belongs, figuring that I’m close to done anyway.

Hah.

I plugged it all in, including the DLT drive and powered it up. It hung at the kudzu process. Crap. So I power cycled it and told it to boot in interactive mode.

I skipped kudzu and it then spat out the SCSI message and hung.

Argh. That’s it! Out comes the SCSI card. I’ll make a partition on my Openfiler box and do backups to that. Damn it.

So I booted it again. Kudzu detected the missing SCSI card and removed the config and all is well.

I head upstairs to my laptop and tried to ssh into the box.

No route to host.

No. Fucking. Way.

I tried to ping it.

No route to host.

So I trekked down to the basement once again. I made sure the network cable was plugged in. It was. I made sure the link light was lit. It was. I logged into the console and tried to ping the firewall.

No route to host.

Argh.

I figured it was a bad NIC, so I went upstairs and pulled the NIC out of the IBM. I went back to the basement and swapped it in.

Have I mentioned yet what a pain it is to swap PCI cards into this box? Since it’s a 3RU case, its got a riser board and the cards mount sideways. But in order to get the cards into the riser board you have to pull 5 screws and lift out the motherboard assembly. Now that I’ve done it six or seven times, it’s easy. Not.

Anyway, I swapped in the new NIC, buttoned it up and booted it.

I pinged the firewall.

No route to host.

Okay, I swapped the cable.

No route to host.

This is starting to get really annoying.

So, remembering my old-school PC building tricks, I moved the NIC to a different PCI slot.

Hallelujah! It works!

So what should have been at the most a couple hour job, turned into an all day ordeal. Why does it always seen to be this way with me and hardware?

The final outcome was not really what I wanted.

We are using the new case and motherboard, but we are running on dual 400MHz PIIs, not 550MHz PIIIs. I have no idea if I’ll even notice any performance difference, but it really annoys me.

This Tyan motherboard has something weird going on with it. If you go into the BIOS and then save and exit, instead of rebooting, it powers off. Then you have to pull the power cord for 20 seconds before it will turn back on. It also does this if you enter the config for the network card, the SCSI card, or if you give it the three fingered salute (ctrl-alt-delete.) But, if you use the reset switch on the front, or if you tell RedHat to do an init 6, it reboots correctly. Something is going on with the NMIs that I haven’t been able to figure out. There is nothing at Tyan about it and no Google search that I have come up with has gotten any hits that were relevent. There don’t appear to be any relevant jumpers either.

I don’t have a tape drive plugged in anymore. Not that I was really doing backups anyway. But I should be. Maybe I’ll plug the tape drive into the Openfiler box? I’ll have to think about this.

Anyway, I can backup the server to an Openfiler share. That way I’ll at least have the data in two places. Assuming the Openfiler box stays up, that is.

I did manage to get a second 80GB hard drive mounted and cabled in the new server, so some other time when I’m not so frustrated with computers I’ll be able to mirror the drives and then I’ll have some more redundancy there.

I don’t know why the IBM will no longer boot. It’s possible that I zapped it, but I don’t think so. It’s also possible that I zapped one of the PIIIs in some way that it sometimes works. I doubt it though.

I’m going to have to play with the PIIIs and the IBM chassis to see if I can make it go. I may also play with the PIIIs in the new box some other time.

But man I am all done with computer hardware for a while.

I think tomorrow will be a hardware-free day. (Oh great, now I’ve jinxed myself.)