Sorry Paul, I had to do this....

I'm talking about geoclustering: building clusters of servers that are in physically separate locations. For example, a company in a metropolitan area such as Detroit could run Exchange Server on a cluster, placing one cluster server at headquarters in Dearborn and another at a factory in Melvindale. The company could connect both servers to a storage unit at a facility somewhere else in the area.
The article goes on to describe the technologies required in order to implement such a physically disconnected clustering model for Windows 2003 and Exchange.  Summary: It ain't pretty, folks.
The bottom line is optic fiber.
OK, so let me get this straight.  In order to have a true shared-nothing, "hot site" disaster recovery architecture for Exchange, we're talking about running fiber?  I'll bet it would be a whole lot cheaper to just migrate that Exchange environment to Domino Enterprise servers, use the software-layer clustering, that can even run on different versions of the product and operating platforms, and call it a day.  One customer I've worked with in New England measures their Domino availability at 99.9999% (I am sure there are more such stories).  There's no fiber involved, just solid software architecture.
Link: Exchange & Outlook update: Geoclustering Exchange >

Post a Comment

  1. 1  Philip Storry  |

    Microsoft's clustering solution is pretty poor, IMO.

    Microsoft has two tactics they often implement with their first stabs at architectures - the first is to hope that hardware catches up with their exorbitant requirements, and the second is to simply ignore the limitations until they're able to do a rip-and-replace with a version 2 which has a better but incompatible architecture.

    Their clustering architecture - which is distinctly aimed at shared storage models more than anything else (Why? Shared storage is a dumb idea, folks!) - is overdue for the latter at some point. Mostly because the hardware is just not going to catch up quickly enough to handle both the clustering AND the products on top of it. Not without great costs, anyway...

    There. I said it, and I feel much better now. :-)

    (Personally, I'm just glad I always managed to talk customers round to using Domino's native clustering and avoiding putting anything Domino on shared storage.)

  1. 2  Henry  |

    How can any organization measure Domino availability at 99.9999%? I'm not aware of any operating system or hardware that can meet those exacting statistics over the long run. This seems to be an impossibility. Can you please share your thoughts? Even with clustering, my company has run into numerous hardware, network, and storage issues. My actual experience in 5 fortune 100 companies does not seem to support this kind of high availability claim. What kind of uptime has IBM measured in the past few years? Would you care to share that information with us?

    I do agree that Lotus/IBM high availability and clustering is better than anything Micro$oft has to offer.

  1. 3  Warren Elsmore www.elsmore.net |

    In that it supports seamless failover (well, if you ignore the 30 seconds to a minute when it is failing over...). Everytime we deploy a cluster I have to caveat it with 'users won't notice in the following circumstances'. I wonder when we are likely to see that in Domino clusters without having to even consider multi-layer clustering/partitioning etc etc(I pray for the day!!!)

    Also - the shared IP model doesn't require an IP sprayer either (not that it would work with geographically distant clusters though)

  1. 4  Philip Storry  |

    Uptime, when measured, is usually measured as "uptime excluding scheduled donwtime for maintanance etc." - so you can get 99.999% or higher.

    Yes, no server is ever going to give you 100% uptime. (A mainframe might come close, with all that redundancy, though...)

    But when yoou measure the uptime excluding scheduled downtime, you get a good figure for business purposes. After all, you're not scheduling downtime for business hours, so it has no effect on the business - right?

    Microsoft's failover system is an odd one. Yes, it fails over EVERYTHING - but it takes so long sometimes that the clients can timeout for their SQL or Exchange connection and have to reconnect. It's not a very fast failover. Domino has chosen the speed of failover - instant, after the client realises something's gone wrong - in preference to being able to failover all operations (including writes). I guess you pays your money and takes your choice - to my mind, neither option is perfect. :-(

    (Active/Active clustering can give you instant failover on an MS cluster, but the cost of Active/Active clustering, in hardware and software as well as expertise in its configuration and running, is so high you may as well just buy a Domino license and a mainframe to run it on... *grins*)

  1. 5  Oliver Regelmann http://www.n-komm.de/blog.nsf |

    I don't really get it. If i imagine a Domino geocluster with a certain amount of users, already the cluster replication would need a fast connection. If I further imagine users falling back to the server on the other side of the country/continent/world I would need a connection as fast as my LAN. Ergo optic fiber.

    Where's the difference?

    And Domino clustering is surely not the best thing we could have. IIRC it still isn't able to fallback users with an opened document or database without closing this. And I can show you databases with documents that simply refuse to replicate. For any reason.

  1. 6  Ed Brill www.edbrill.com |

    even if no single server/OS can maintain 99.9999% uptime, a cluster certainly can -- the user may never see downtime with two+ effective clustered servers.

  1. 7  Nathan T. Freeman  |

    The point of a geocluster is not to load-balance users to a remote server. It's to failover users to a remote server. It's possible, and even quite reasonable, to deliver a significantly lower service level during outage times. If you have a WAN connected by, say, 10Mbps, you can support about 80-100 concurrent users for regular usage and up to 400 for emergency situations.

    99.999% basically means less than 1 hour of unplanned user downtime in a year. That's pretty good, but I really think that it's overrated as an achievement. It's not that hard to keep a Domino mail cluster offering continuous planned service for a year straight. Not if you have knowledgable administrators and a solid network architecture. The places where I've been for the last few years have seen far more network breakdowns than actual server failures.

  1. 8  Chris Miller http://www.IdoNotes.com |

    We obtain that sort of performance for a set of hosted/managed servers we have. Downtime is never at the same time for any maintenance and the users then see that statisitcal uptime. Not hard at all. And the servers are in two states. One in Wisconsin and one in Missouri.

  1. 9  Henry  |

    Ed,

    Can you comment specifically on IBM's uptime experience?

  1. 10  Ed Brill www.edbrill.com |

    check www.lotus.com/redbooks . With the amount of "dogfood" beta code we run, it's never going to be that same 99.9999% number for all IBMers, but it is pretty close.

  1. 11  Alan Lepofsky  |

    http://www-306.ibm.com/software/success/cssdb.nsf/CS/NAVO-5BGNXK?OpenDocument&Site=default