Avoiding splitbrain in a heartbeat/drbd setup

August 26, 2009

What comes now is the description of a hack I did to avoid the occurrence of splitbrain in a 2 node linux cluster running heartbeat and drbd for disk replication.

I am not going to detail how to setup heartbeat and drbd and will assume that you are already familiar enough with this stuff.

To the matter at heart: in some circumstances, in a standard heartbeat/drdb setup, there still remains some situations that will result in a splitbrain.

Let's take an example: two nodes, N0 and N1. N0 is primary, N1 is secondary. Both have redundant heartbeat links and at least one dedicated drbd replication link. Let's consider the (highly) hypothetical case when the drbd link goes down, soon followed by a power outage for N0. What will happen in a standard heartbeat/drbd setup is that when the drbd link goes down, the drbd daemon will set the local ressources on both nodes in state 'cs:WFConnection' (Waiting For Connection) and mark the peer data as outdated. Then when N0 disappears due to the power outage, heartbeat on N1 will takeover ressources and become the primary node.

Here is the glitch: N0 may have made changes on its local disk between the time the drbd link went down and the power outage. These changes were not replicated on N1. And now N1 is running as primary with outdated data.

Not Good.

What we may want is to forbid a node to become primary in case its drbd resources are not in a connected and up-to-date state. This would avoid most cases of data corruption but also implies longer downtime.

As far as I can tell, there is no configuration parameter to do that in heartbeat/drbd. But we can work around that.

Upon trying to become primary, heartbeat starts its resources listed in the file /etc/ha.d/haresources. In a heartbeat/drbd setup, one of those resources is drbddisk which is just a script located in /etc/ha.d/resources.d/. If this script exits with an error code, heartbeat will give up trying to takeover resources in the cluster. Beware that this might lead you to situations where both nodes are secondary.

Here is are a few lines to add to drbddisk in order to block takeover when the local resource is not in a safe state:

case "$CMD" in
   start)
     # forbid to become primary if ressource is not clean
     DRBDSTATEOK=`cat /proc/drbd | grep ' cs:Connected ' | grep ' ds:UpToDate/' | wc -l`
     if [ $DRBDSTATEOK -ne 1 ]; then
       echo >&2 "drbd is not in Connected/UpToDate state. refusing to start resource"
       exit 1
     fi

NOTE: this patch works only if you have one and only one drbd resource.
WARNING: do not modify those scripts if you don't exactly know what you are doing...

This done, there are a few more things you may need:
- if you are using heartbeat in combination with ipfail, you might want drbddisk to forbid the node to become primary if it can't ping a given host (for example the gateway). That could look like:

PINGCOUNT=`ping -c 4 $PINGHOST | grep -i "destination host unreachable" | wc -l`
if [ $PINGCOUNT == 4 ]; then
   # we lost all 4 packets. the network is down and the other node
   # might still be primary. don't come up.
   echo >&2 "cannot ping $PINGHOST. refusing to start resource"
   exit 1
fi

- if you are using a stonith device, you may want to modify the stonith script to forbid stonithing the peer if the local resources are not in connected/up-to-date state. There might indeed be a chance that the peer node still is functional while the local node definitely is not.

Comments

RSS feed for comments on this post.

The URI to TrackBack this entry is: http://lemonnier.se/erwan/blog/bblog/trackback.php/53/

Leave a Comment

Sorry, Comments have been disabled for this post