Timestamps in bash

February 16th, 2010

This is common knowledge but I found it so useful that I have to make sure it spreads even more :)

You can make bash register timestamps in its history:


export HISTTIMEFORMAT='[%F %T] '


And you can even alter the bash prompt to show timestamps as well, using the variable PROMPT_COMMAND:


export PROMPT_COMMAND="echo -n \[\$(date +%H:%M:%S)\]\ "


Voila! No more problem trying to figure out when things happened and how long time they took.

Of course, those 2 lines should be added to your ~/.bash_profile or equivalent for persistence.

Avoiding splitbrain in a heartbeat/drbd setup

August 26th, 2009

What comes now is the description of a hack I did to avoid the occurrence of splitbrain in a 2 node linux cluster running heartbeat and drbd for disk replication.

I am not going to detail how to setup heartbeat and drbd and will assume that you are already familiar enough with this stuff.

To the matter at heart: in some circumstances, in a standard heartbeat/drdb setup, there still remains some situations that will result in a splitbrain.

Let's take an example: two nodes, N0 and N1. N0 is primary, N1 is secondary. Both have redundant heartbeat links and at least one dedicated drbd replication link. Let's consider the (highly) hypothetical case when the drbd link goes down, soon followed by a power outage for N0. What will happen in a standard heartbeat/drbd setup is that when the drbd link goes down, the drbd daemon will set the local ressources on both nodes in state 'cs:WFConnection' (Waiting For Connection) and mark the peer data as outdated. Then when N0 disappears due to the power outage, heartbeat on N1 will takeover ressources and become the primary node.

Here is the glitch: N0 may have made changes on its local disk between the time the drbd link went down and the power outage. These changes were not replicated on N1. And now N1 is running as primary with outdated data.

Not Good.

What we may want is to forbid a node to become primary in case its drbd resources are not in a connected and up-to-date state. This would avoid most cases of data corruption but also implies longer downtime.

As far as I can tell, there is no configuration parameter to do that in heartbeat/drbd. But we can work around that.

Upon trying to become primary, heartbeat starts its resources listed in the file /etc/ha.d/haresources. In a heartbeat/drbd setup, one of those resources is drbddisk which is just a script located in /etc/ha.d/resources.d/. If this script exits with an error code, heartbeat will give up trying to takeover resources in the cluster. Beware that this might lead you to situations where both nodes are secondary.

Here is are a few lines to add to drbddisk in order to block takeover when the local resource is not in a safe state:

case "$CMD" in
   start)
     # forbid to become primary if ressource is not clean
     DRBDSTATEOK=`cat /proc/drbd | grep ' cs:Connected ' | grep ' ds:UpToDate/' | wc -l`
     if [ $DRBDSTATEOK -ne 1 ]; then
       echo >&2 "drbd is not in Connected/UpToDate state. refusing to start resource"
       exit 1
     fi

NOTE: this patch works only if you have one and only one drbd resource.
WARNING: do not modify those scripts if you don't exactly know what you are doing...

This done, there are a few more things you may need:
- if you are using heartbeat in combination with ipfail, you might want drbddisk to forbid the node to become primary if it can't ping a given host (for example the gateway). That could look like:

PINGCOUNT=`ping -c 4 $PINGHOST | grep -i "destination host unreachable" | wc -l`
if [ $PINGCOUNT == 4 ]; then
   # we lost all 4 packets. the network is down and the other node
   # might still be primary. don't come up.
   echo >&2 "cannot ping $PINGHOST. refusing to start resource"
   exit 1
fi

- if you are using a stonith device, you may want to modify the stonith script to forbid stonithing the peer if the local resources are not in connected/up-to-date state. There might indeed be a chance that the peer node still is functional while the local node definitely is not.

sshfs + encfs + rsync = encrypted remote backups

March 8th, 2009

I am a backup freak. My home setup encompasses a file server that has 2 disks mirrored against each other using RAID1 and a second server that rsyncs its content against the main file server every night. But having 3 local copies of my data is not enough. What if a nuke fell over Stockholm, destroyed my home but left me alive? Where would I find a backup of my cvs repository then? (assuming I would care)

So I needed one more backup. Abroad.

A friend provided me with a ssh account on a server with a fat disk. But what if thieves grabbed my friend's drive? I would not want them to play with my cvs repository! So I needed the remote backup to be encrypted. An other friend came up with a brilliant suggestion: use an encrypted filesystem located on the remote host but mounted on my local server. Which sounds complicated at first but appeared to be really trivial thanks to two things: sshfs and encfs.

Sshfs lets you mount a remote file system using just an usual ssh connection, making it look like a part of the local file system. Encfs creates an encrypted filesystem on top of an other filesystem. Both run in userspace under linux.

The trick is to run encfs locally on top of the remote filesystem mounted locally with sshfs.

The following commands show how to do that on a debian:

// let's assume that I am user foo
// with uid=1000 and gid=1000
$ id
uid=1000(foo) gid=1000(foo) groups=...

// start by installing sshfs and encfs (as root)
# apt-get install sshfs
# apt-get install encfs

// configure the fuse module needed by encfs and sshfs
# modprobe -v fuse
# echo fuse >> /etc/modules
# usermod -a -G fuse foo

// as user foo, mount your remote home using sshfs:
$ mkdir ~/remotefs-encrypted
$ sshfs -o workaround=rename,uid=1000,gid=1000 \ foo@some.other.place.net:/home/foo/ ~/remotefs-encrypted
(enter password here or use shared keys)

// now initialize an encrypted filesystem located
// in foo@some.other.place.net:/home/foo/
$ mkdir ~/remotefs-clear
$ encfs ~/remotefs-encrypted ~/remotefs-clear
(the first time you will have to answer a few setup questions and provide a password)

// rsync whatever you want
$ rsync -avz --del ~/important-stuff ~/remotefs-clear

// then unmount
$ fusermount -u ~/remotefs-clear
$ fusermount -u ~/remotefs-encrypted

// note that you can script all this, even mounting
// the encrypted filesystem:
$ echo "SECRETPASSWORD" | encfs -S ~/remotefs-encrypted ~/remotefs-clear



UPDATE:

Well, after a few weeks of real-life trial it shows that this method is not stable enough. I keep stumbling on various bugs with both sshfs and encfs. At first, I had to use 'rsync --checksum' because encfs seemed to mess up timestamps. Later, it appeared sshfs causes IO errors when under heavy load from rsync (see http://osdir.com/ml/file-systems.fuse.sshfs/2006-10/msg00017.html). Conclusion: I am giving up this method for the time being. Hopefully this will get stable enough in some near future.

pushd and popd

April 25th, 2008

It's amazing how one can keep on learning new bash tricks year after year. Today's candy is a pair of bash builtins: pushd and popd. pushd pushes a path on a stack, and popd pops the latest stacked path from the stack and does a cd to it.

Extremly useful!

Still, writing "pushd `pwd`" each time I want to save my location is a bit too much work, so I added 2 aliases to my bash config:

alias po='popd'
alias pu='pushd `pwd`'

Example:

[HEAD] ~/test/Module> !301$ pu
~/test/Module ~/test/Module
[HEAD] ~/test/Module> !302$ cd
[HEAD] ~> !303$ po
~/test/Module
[HEAD] ~/test/Module> !304$


backdiff

December 4th, 2007

Every morning, when I come to work and login into the development server, I have a script that shows me all the files that have changed during the previous day.

Since I want to keep myself updated with what's happening in other corners of our system, I end up repeatedly typing 'cvs status' followed by 'cvs diff -r ' to check the latest changes in files.

A few days ago I got tired of this extra work, so I wrote the following script: backdiff.

It runs like that:

backdiff <file>

which shows what changed in that file upon the last cvs commit. Or:

backdiff -c X <file>

showing the changes across the last X cvs commits.

Manipulating CVS/Root files

November 24th, 2007

To see which CVS repository is being pointed at by the various subdirectories of a local CVS project checkout:


find . -path "*CVS/Root" -exec cat {} \;


And to change the target CVS repository from 192.168.1.4 to say 192.168.1.2 (after moving the repository to a new server) without having to check-out your project once more, just edit all CVS/Root files as follow:


find . -path "*CVS/Root" -exec sed -i tmp -e "s/192.168.1.4/192.168.1.2/" {} \;


If you are moving your cvs repository to a new host, it could be a good idea to check that all the CVS/Root files of your local project point to the same repository. You don't want to commit some changes against the old repository and some against the new one... In a .bash_profile or similar, you could just write something like:


cd $PROJECT
COUNT_ROOTS=`find . -path "*CVS/Root" -exec cat {} \; | sort | uniq | wc -l`
if [ $COUNT_ROOTS -ne "1" ]; then
     echo "ERROR: project refers to more than 1 cvs repository!";
     # exit swiftly
fi
# all green