前阵子了解了下OpenStack高可用性的大致方案，比如这里使用了HAProxy+Keepalived的方案，这又让我想起了以前配置LVS时用过的Heartbeat。我没有配置过Keepalived，但我想知道这里为什么不用Heartbeat而是使用Keepalived，于是我搜啊搜，如“Heartbeat vs Keepalived”、“Comparison of Heartbeat and Keepalived”等等，终于搜到了我自己觉得表述很清楚的答案，这个回答又正好是来自HAProxy的作者Willy，于是我想当个搬运工，要看原文点击这里。
Well, I’m not sure whether you’ll find a response here as this is purely a heartbeat question.
Anyway, I’d like to say that I’m amazed by the number of people who use heartbeat to get a redundant haproxy setup. It is not the best tool for*this* job, it was designed to build clusters, which is a lot different from having two redundant stateless network equipments. Network oriented tools such as keepalived or ucarp are the best suited for that task.
The difference between those two families is simple :
- a cluster-oriented product such as heartbeat will ensure that a shared resource will be present at *at most* one place. This is very important for shared filesystems, disks, etc… It is designed to take a service down on one node and up on another one during a switch over. That way, the shared resource may never be concurrently accessed. This is a very hard task to accomplish and it does it well.
- a network-oriented product such as keepalived will ensure that a shared IP address will be present at *at least* one place. Please note that I’m not talking about a service or resource any more, it just plays with IP addresses.
It will not try to down or up any service, it will just consider a certain number of criteria to decide which node is the most suited to offer the service. But the service must already be up on both nodes. As such, it is very well suited for redundant routers, firewalls and proxies, but not at all for disk arrays nor filesystems.
The difference is very visible in case of a dirty failure such as a split brain. A cluster-based product may very well end up with none of the nodes offering the service, to ensure that the shared resource is never corrupted by concurrent accesses. A network-oriented product may end up with the IP present on both nodes, resulting in the service being available on both of them. This is the reason why you don’t want to serve file-systems from shared arrays with ucarp or keepalived.
The nature of the controls and changes also has an impact on the switch time and the ability to test the service offline. For instance,with keepalived, you can switch the IP from one node to another one in just one second in case of a dirty failure, or in zero delay incase of volunteer switch, because there is no need to start/stop anything. That also means that even if you’re experiencing flapping,it’s not a problem because even if the IP constantly moves, it moves between places where the service is offered. And since the service is permanently available on the backup nodes, you can test your configs there without impacting the master node.
So in short, I would not like to have my router/firewall/load balancer running on heartbeat, as well as I would not like to have my fileserver/disk storage/database run on keepalived.
With keepalived, your setup above is trivial. Just configure two interfaces with their shared IP addresses, enumerate the interfaces you want to track, declare scripts to check the services if you want and that’s all. If any interface fails or if haproxy dies, the IP immediately switches to the other node. If both nodes lose the same interface (eg: shared switch failure), you still have part of the service running on one of the nodes on the other interface.
Hoping this helps understanding the different types of architectures one might encounter,