Exchange, Load balancers and recommendations
This is a follow-up post to Differences in Exchange Load Balancing recommendations by Microsoft and vendors.
This post refers to issues I discovered and were discussed in that post. I suggest reading the previous article before reading this one. I also expect some experience with load balancers in combination with Exchange Server 2010.
In the previous post I mentioned several discrepancies in the Exchange deployment guides from several Load balancing companies. I contacted them and pointed out their apparent discrepancies. Their responses were very cooperative, kudos to them!
Due to time restrictions, I have not checked all deployment guides mentioned. Possibly later on
I have had intensive contact with Kemp Technologies on mentioned discrepancies and other issues. This was during an implementation of two multi role Exchange Server 2010 boxes in a DAG with their virtual VLM-100, with clients using Outlook 2007 and Outlook 2010.
I mentioned that they do not recommended SNAT as the setup for exchange. Unfortunately this was a misunderstanding in the terminology used. Microsoft Source NAT is not the same as current Kemp SNAT, it is however the same as L7 non-transparency. Kemp SNAT is when the source IP is changed to the VIP of the rule, not the real IP.
The wildcard rule, as mentions in the Kemp exchange deployment guide, did miss some additional clarification. Kemp agreed that it is a good practice to use static ports on the Exchange server. However, using a rule per port does have the disadvantage that clients can connect to different client access servers For all services needed by outlook MAPI RPC. A wildcard rule resolves this.
It is also important to note that specific port rule for the same VIP always has preference over the wildcard rule.
Idle Connection Timeout
Now, this wasn’t something I mentioned in my previous blog, but did have me busy for quite some time. It is also why it has taken a while to prepare this post.
In their deployment guide Kemp states that the Idle Session Timeout values, should be around 20 seconds. Kemp explains this as they want the failover value to be as close to the DAG failover time, as that’s would be within 30 seconds. If the value was high, the user had to wait for the idle timeout to expire. But if it was to low, the client would reconnect multiple times. So, that value would be a trade off.
Now, while deploying the load balancer with low values, users got warnings that their Outlook has re-established it’s connection to Exchange. Several times per minute. Next to that, the CPU of both Client Access Servers went to the roof and made the Exchange environment perform poorly. Increasing the value immediately showed improvement on performance.
Other issues with low values presented themselves when opening the Global Address book on Outlook Online mode. First it failed with the message that it hadn’t a connection to Exchange, while the user was working without (many) problems. A second attempt worked perfectly.
This is probably due to the different port used by Outlook. Although a wildcard rule was used for Outlook MAPI RPC the Kemp Load Balancer, the timeout value will be valid per port and is not based on source IP.
Kemp will address this in an upcoming update; the option will be available to kill not yet timed out connections whenever the real server has failed the Real Server check.
Having said that, I must confess that I didn’t have any real troubles with Outlook clients (in online mode) having to wait for the timeout to reconnect. When I disabled one Real Server CAS, after a while the clients just reconnected and worked perfectly within seconds to a few minutes.
So, the trade-off Kemp mentions could be a non-issue. At least in the case of Outlook 2010 RTM in Online Mode with a single VLM-100. But I can see no drawback from killing idle connections to a failed Real Server.
Note (edit 20/07/2011): Disabling a Real server is not the same as a failure. New connections to Real Server will not be set up and exsisting connections will drain and end after (default) 300 seconds.
In any case, Timeout values for MAPI connections (i.e. Outlook) should be high (as in hours).
In my last post I would contact Loadbalancer.org regarding their deployment guide. And their response came quickly and their updated Deployment Guide is already available.
They agreed with the Microsoft recommendation regarding using round robin instead of least weighed connection. They already have corrected their Exchange deployment guide, with a correct notation that it could take some time in order to let the load balancer evenly distribute the sessions.
Another interesting note: During my contact with them, they let me know that they are working with Microsoft to get their product Exchange load balancer qualified. I do not have any experience with them, but more certified choices are always a good thing.
In between this and the previous post I was made aware of the Coyote Point deployment Guide. Unfortunately they also had some discrepancies.
In their Exchange Deployment Guide v1.3 they mention on page 9 that round robin actually isn't recommended. They actually note that Microsoft recommends least connections as balancing policy. Obviously, Microsoft has since changed its stance on this.
On the subject of Source NAT, Default Gateway or Direct Server Return, they advise dg or static routes ( page10) Although Microsoft recommends Source NAT, the coyote point does not recommend this. The reason is because Exchange needs the client IP addresses. That however is false. You do lose client IP information, but in some or even most cases this is more preferable than a major change in your network architecture, IMHO.
Pages 20/21 discuss persistence, they do not recommend specific values but mention examples. They are however in the range of several hours, where Microsoft almost always recommends 1 hour as a rule of thumb. But the explanation is correct.
They have responded to the issues I have raised and they are already in the progress of changing their recommendation to be in accordance with those from Microsoft. So if you use or are planning to use their products, keep this in mind.
Terminology, terminology, terminology.... Almost all companies have different definitions describing concepts and technology. This is quite unfortunate as it tends to elongate troubleshooting and support. I still would like to see that the Microsoft Load Balancing Qualification for Exchange contained the demand for explicitly defining terminology as used by Microsoft and by load balancing companies. Kemp technologies has stated that they are working to get the terminology in sync with Microsoft.
Also, I cannot find a complete Technet article describing the same recommendations that where made in the TechEd session EXL307.
I also would like to see some sort of requalification of some sorts, I would be surprised that some of the deployment guides were based on older recommendations from Microsoft who has since changed due to customer responses and support calls. This would make this qualification much more valuable and leads to better understanding of load balancers by admins and technical specialists and a higher quality of Exchange 2010 high available deployments.
And as a last recommendation: Don’t always believe what vendors say. Check, test and supply information to your load balancing vendor. I have had very good responses by all vendors I’ve contacted and they where very appreciative of my feedback.
I have no legal or financial connection with the load balancing companies mentioned in this or the previous post. Kemp and possibly other vendors make unlimited trial load balancers available for personal testing purposes. These offers did not contain any strings attached.