Configurations from Hell.
A few weeks ago, I got a call from one of our accountmanagers. If I would be available to perform a migration at a customer. The situation seemed a bit tricky, as the customer had a clusterconfiguration that, apparently, 'wasn't very stable'. That, it turned out, was a euphemism..
Upon arrival, I logged onto the servers and examined the situation. It seemed that everything that God and Bill Gates had dictated had been completely put aside in the name of clustering; Let me draw you a verbal picture:
Both clusternodes were configured as domain controllers, with all the little bits and extras you could imagine. Exchange 2003 was also placed in the clusterconfiguration. But wait, that's just the beginning! Apart from this already hideous configuration, whoever had created this configuration had also decided that it would be an excellent idea to place BackupExec, SQL Server 2000 and file- and printservices on the nodes. And, to finish it off, there were at least 10 little tools and tweaks in place on every node; in short, it was a complete technical nightmare. Oh, and did I mention that the two nodes were moved out of the Domain Controllers OU, most likely because the DDC policy made the machines too restricted ? I probably didn't; well, the DCs were moved out of the Domain Controllers OU. Oh yay. Oh, and did I mention that SQL Server 2000 was only installed on the first node and that the SQL resource was configured to remain on that node ? Yeah, that's what I call a benefit of clustering your resources..
We started with putting a shiny new DC in place, made all the servers a GC (naturally, only the first node was a GC _and_ held all the FSMO roles..) and then moved the FSMO roles to the new machine. Guess what ? The cluster crashed. Completely. Surprised ? I wasn't. A quick scan around the Event Log and the Services MMC quickly pointed me to the Network account that lacked access to the DTC; KB article 923977 very swiftly yielded the solution, and cluster failovers were once again possible.
Then, a temporary Exchange 2003 SP2 server was put in place. This, thankfully, didn't pose any problems... yet.
On Friday, the mailbox migration process was started. As quite a few people had mailboxes well over 3 GB in size, this took quite a while.. after that, System Folders and Public Folders were rehomed, routing connectors were adjusted, the RUS was rehomed, and things seemed fine.
De-installing Exchange from the cluster didn't go flawless; quite a few errors regarding missing installation files made it a rather hellish process (people who make temporary drive mappings for installation that point to completely illogical and non-existing datashares should be shot on sight), but with some fiddling, we managed to remove the cluster resource and software.
And then, the moment of truth. Time to demote the cluster nodes and see what happens. Can you guess ? Yes, the cluster went berserk. Services, failover, everything completely crumbled. Once again, the DTC fix mentioned earlier saved the day; it makes you wonder why exactly Microsoft declared this to be a unsupported configuration. I have _no_ idea..
Thankfully, we managed to tackle the situation and very soon, we had two stable cluster nodes. Exchange was reinstalled, public folders and system folders rehomed, and as I write this, the mailboxes are moving back from the temporary server to the cluster.
As a temporary fix, I've had to place secondary DNS-zones on the cluster.. wanna guess why ? Come on. Guess! Yes, because the Nokia Checkpoint that acts as firewall and VPN-server is a complete and utter mess.. Of course, a company with 40 users definitely needs a Nokia Checkpoint. It wouldn't do, for example, to place a PIX515e or an ISA Server for that matter. That'd be too cheap and too easy!
Service accounts, freeware tools, completely modified ACLs, a cluster that should, by all common sense, have collapsed years ago and general incompetence.. just the thing to have to clean up on a Saturday..