Unix flight recorder

Unix flight recorder is a simple shell script that collects system information using traditional system administration tools.

It was created because "sar" does not collect enough information - a lot more can be obtained with UFR.

It lets a Sys Admin "go back in time" and see what was happening on the system, who was logged in, what processes where running, what network connections existed, etc.

Download source code - ufr.sh
Example output, snapshot taken 19:30-19:35 on 16 Sep 2007:

aleksey@linux:/var/log/ufr/Sep2007/16/19:30% ls 
iostat.1.300  mpstat.1.300  netstat.-an  netstat.-i  netstat.-rn  ps  top  vmstat.1.300
aleksey@linux:/var/log/ufr/Sep2007/16/19:30%

The "1" indicates how often we sampled, and the "300" for how long we sampled (both in seconds).

Source code listing:


#!/bin/sh

# Unix flight recorder

# Purpose: provide sys administration data about subsystem
# status/performance and running processes to supplement sar
# in monitoring performance
#
# by Aleksey Tsalolikhin, http://www.lifesurvives.com/tech/ufr.html
# I've used this on Linux - please let me know if you make changes to it
# to add other systems or gather more data.   Gmail (Aleksey.Tsalolikhin)

# with TTL 300 seconds, add a cron jobs like this to
# collect 5 minutes of data:
#
# 0,5,10,15,20,25,30,35,40,45,50,55 * * * * ufr.sh
#
# However you can change TTL to whatever interval you want
#
# UFR starts from cron, gathers TTL worth of data, and exits

TTL=300 	# how many seconds we should run
INTERVAL=1	# how often to sample, used as argument
		# to vmstat, etc.

DATADEPOT=/var/log/ufr
MONTHYEAR=`date +%h%Y`	# e.g. Sep2007
DATE=`date +%d` 	# date of the month 
TIMESTAMP=`date +%H:%M`

DATADIR=$DATADEPOT/$MONTHYEAR/$DATE/$TIMESTAMP

mkdir -p $DATADIR || echo failed to mkdir $DATADIR



OPERATING_SYSTEM=`uname -s`

# set up the arguments for "top", which is different
# from OS to OS

# we'll run top every 10 seconds instead of every second to
# reduce load on the OS
TTL_OVER_10=`echo $TTL/10|bc`
if [ $OPERATING_SYSTEM = "Linux" ] ;
then TOP_ARGS=" -b -d 10 -n $TTL_OVER_10"
elif [ $OPERATING_SYSTEM = "HP-UX" ]
then TOP_ARGS=" -s 10 -d $TTL_OVER_10"
fi


vmstat $INTERVAL $TTL > $DATADIR/vmstat.$INTERVAL.$TTL &
iostat -x $INTERVAL $TTL > $DATADIR/iostat.$INTERVAL.$TTL &
mpstat $INTERVAL $TTL > $DATADIR/mpstat.$INTERVAL.$TTL &
top $TOP_ARGS > $DATADIR/top &

netstat -i > $DATADIR/netstat.-i
netstat -rn > $DATADIR/netstat.-rn
netstat -an > $DATADIR/netstat.-an
w > $DATADIR/w

if [ $OPERATING_SYSTEM = "Linux" ] ;
then free -s 10 -c $TTL_OVER_10 > $DATADIR/free
fi

CNT=1
while sleep 1
do
        ps auwx > $DATADIR/ps.`date | sed -e 's: :_:g'`
        CNT=`echo $CNT + 1 | bc`

        if [ $CNT -gt 300 ]
        then
                exit
        fi
done


exit

Up to: Aleksey's tech notes