How do you quickly transfer data from one machine to another? Ian Bruntlett shows us the bash script he uses.
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system.
~ John Gall
Some time ago Frances Buontempo was looking for articles for Overload . I mentioned in an e-mail that I had a backup script called stufftar that I could write about. Frances kindly provided the questions that this article was built on.
What inspired it? Were you trying to solve a specific problem?
I use both Ubuntu and Lubuntu Linux. I am a volunteer tester of both. For personal use, I use Ubuntu and LUbuntu and I want to keep up to date so I needed some method to create backups and transport of key files and folders as flexibly as possible.
Initially I was backing up key files (all of ~/stuff ) to a .tar.gz file. Then I decided I wanted to automate it a bit so I worked out how to automatically create its destination filename:
DESTINATION_FILENAME=$1`date "+_%d_%B_%Y.tar.gz"`
Then I added commands to display the time taken to do the backups using the line:
/usr/bin/time -f "%E mins:secs " tar -czf $DESTINATION_FILENAME $4 $5 $6 $7 $8 $9 ${10} ${11} ${12} ${13} ${14} ${15} ${16} ${17} ${18} ${19} ${20}
I had to call time from
/usr/bin
because bash was ‘hiding’ it with its own, less flexible,
time
implementation, which doesn’t support the options I wanted.
Because of my inexperience with bash, testing was very important to me. As new features were developed, I would put a ‘test’ framework in place, thoroughly test each new feature and then remove the ‘test’ framework. Having functions ensure they had been given sufficient parameters was particularly important.
As time went by, I added more options. In particular I wanted to backup individual folders leading to the options (
desktop
,
extra
,
rpg
,
home
and
localhost
) being implemented. For convenience I wanted to be able to specify ‘backup everything’ so I implemented the
all
option. To keep things transparent, I implemented the
verbose
option for stufftar to output more information about what it is currently doing. I typically have a refurbished Lenovo ThinkPad T420 (from www.tier1online.com) as my main computer (hostname newton) running Ubuntu Linux. I have an old 32 bit Samsung NC10 that I take with me when visiting family (running lubuntu Linux). I back up my .tar.gz files to USB flash drive. I also back them up to external hard drives. I’ve got three external 500GB hard drives and, once a month, I copy that month’s latest .tar.gz files to one of them. I rotate my use of external hard drive so that I’m backing up to the one containing the oldest backup of the set. About once a year I archive everything to dual-layer DVD-R – mainly as individual folders but I put the ‘home’ and ‘localhost’ .tar.gz files as is onto the DVD-R.
What does it do?
I’ve got a bunch of important digital files I carry around and backup to USB flash drives and hard drives. My
~/Desktop
files are my key files for general use. Other folders are
~/stuff
(the original folder I put my ‘stuff’ in),
~/extra
(the folder I moved from
~/stuff
when it became too large),
~/RPG
(the folder I use to keep RPG PDFs etc in). The
home
option backs up key shell scripts. The
localhost
option backs up key files (
/var/www/
) from LAMP studies.
How do you use it?
For help type in ./stufftar and it gives you a list of command line options (see Figure 1).
ian@newton:~$ ./stufftar ./stufftar Usage : stufftar followed by one or more commands: desktop, extra, rpg, localhost, stuff, home and all All data files are:- 1. Named after the relevant command name, followed by day number, month, year. For example: Desktop_01_March_2014.tar.gz 2. Are created using the tar command with file compression switched on. Explanation of stufftar commands:- desktop - copy desktop files to a desktop tar file extra - copy extra to a tar file – Linux Voice, Overload, QL Today stuff - copy stuff – anything I want to keep (main files are here) rpg - copy ~/RPG to a tar file home - copy refurb, stufftar, coder, removefiles.c to a home tar file localhost - copy the whole /var/www/html subtree to a tar file all - execute stuff, extra, desktop and home commands in one go. Use when you want a full backup. verbose - display more details about the work being done. status - display status info about the stufftarred files on this system. Also consider backing up Firefox bookmarks, and .emacs config file. |
Figure 1 |
What future improvements do you envisage?
As a side-effect of writing this article, I’ve started a ‘TO DO’ comment section. Currently it does everything I need it to do. Also, despite having written a bash shell script helped by my copy of the Linux Pocket Guide (O’Reilly) I’ve never really studied bash. I’ve just muddled through. I have copies of
How Linux Works
and
The Linux Command Line
to make my way through sometime next year – I am concentrating on Ruby this year. The command line options are non-standard but OK for my purposes but could be modified to handle arguments like
-f some_kind_of_parameter
. There is a UNIX tool called TripWire, used to report changes to folders of files. I think I’ll be looking at tackling that – in the future.
A walk through the code (edited highlights)
The line
#!/bin/bash
tells Linux that this is a bash shell script. Some people to specify
sh
instead of
bash
but as this script is for personal use, I’m using bash.
#!/bin/bash # stufftar backup script by Ian Bruntlett, # 2012 - December 2015, expanded and desktar # merged in on August 11th 2012, # March 2013 added "coder" file to Desktop tar, # added BACKUP_HOME
As is usual, information about the script is stored at the start of the script (summarised for brevity).
# echo_and_log(logfilename, text to put in log # file and echo to screen) function echo_and_log()
This is a ‘helper’ function that echoes its parameters both to the screen and to a specified log file. Useful to avoid repeated identical
echo
statements.
# if error code set ($1), display error messages # and exit programme function exit_if_failed()
This is another ‘helper’ function. It gets passed an error code by its caller. Normally it is 0 so this function does nothing. If it is non-zero then diagnostic information is echoed and it exits/aborts the script with a return code of 1.
# $1 log filename aka $LOG_FILE # $2 file to get MD5 from aka $SOURCE_FILENAME # example:- get_and_log_md5 "~/md5log.txt" "localhost_04_May_2015.tar.gz" function get_and_log_md5()
This function calculates the MD5 checksum of the file (function parameter
$2
) and logs it to a logfile (function parameter
$1
). It is a ‘helper’ function used by function
perform_backup
(Listing 1) when global variable
$STUFFTAR_VERBOSE
is greater than zero.
# perform_backup # $1 - stub of .tar.gz filename # $2 - name of log file # e.g. "scripts/stufftarlog.txt" # $3 - directory to do the tarring in # $4 onwards - files/directories to put in .tar.gz # file relative to $3 function perform_backup() { if [ $# -lt 4 ] then echo "Error perform_backup() insufficient no of parameters"; return 1; fi; FILENAME_STUB=$1 DESTINATION_FILENAME=$1`date "+_%d_%B_%Y.tar.gz"` LOGFILE=$2 TAR_DIR=$3 if [ $STUFFTAR_VERBOSE -gt 0 ] then echo $0 $1 $2 $3 $4 $5 $6 $7 $8 $9 ${10} ${11} ${12} ${13} ${14} ${15} ${16} ${17} ${18} ${19} ${20} echo DESTINATION_FILENAME=$DESTINATION_FILENAME e.g. Desktop_28_December_2014.tar.gz echo FILENAME_STUB=$FILENAME_STUB echo LOGFILE=$LOGFILE echo TAR_DIR=$TAR_DIR fi CURRENT_TIME=`date "+%H:%M:%S"` cd $TAR_DIR echo_and_log $LOGFILE $CURRENT_TIME Backing up key $TAR_DIR $4 $5 $6 $7 $8 $9 ${10} ${11} {12} ${13} ${14} ${15} ${16} ${17} ${18} ${19} ${20}files to $DESTINATION_FILENAME /usr/bin/time -f "%E mins:secs " tar -czf $DESTINATION_FILENAME $4 $5 $6 $7 $8 $9 ${10} {11} ${12} ${13} ${14} ${15} ${16} ${17} ${18} ${19} ${20} exit_if_failed $? "perform_backup to " $DESTINATION_FILENAME echo File count:- tar -tvf $DESTINATION_FILENAME | wc -l ls -lh $DESTINATION_FILENAME if [ $STUFFTAR_VERBOSE -gt 0 ] then get_and_log_md5 $LOGFILE $DESTINATION_FILENAME fi echo cd; return 0; } # end function perform_backup |
Listing 1 |
perform_backup
is the ‘engine’ of the script. It validates its parameters. It does some logging, if running in verbose mode. It does the backup using both the
time
command and
tar
. The backup is created by
tar
and
time
outputs the amount of time taken. It also lists the number of files in the tar file by piping a list of file to the word count utility
wc
.
# show_last_line # $1 is the name to show # $2 is the log file to show the tail end of # $3 is the number of lines to show
This ‘helper’ function,
show_last_line
, echoes some information about a logfile – the name of the archive (‘stuff’, ‘localhost’ etc) and the last
$3
lines of the log file
$2
. See Listing 2.
function show_last_line() { if [ $# -ne 3 ] then echo "Error show_last_line() insufficient no of parameters ($#)"; echo usage "show_last_line name of file, source log file, no of lines to show" return 1; fi; echo -n "$1 - $2 " tail -$3 $2 echo -n return 0; } # end function show_last_line |
Listing 2 |
The function in Listing 3 uses
show_last_line
to display the contents of all
stufftarlog.txt
files. The logfile performs two purposes. On the master computer,
show_status()
indicates when a particular folder was last backed up to tar file. On other computers, it shows the age of the data that has been transferred by tar file.
function show_status() { echo STUFFTAR STATUS show_last_line "Desktop" "$HOME/Desktop/stufftarlog.txt" 3 echo show_last_line "extra" "extra/stufftarlog.txt" 3 echo show_last_line "localhost" "/var/www/html/stufftarlog.txt" 3 echo show_last_line "RPG" "RPG/stufftarlog.txt" 3 echo show_last_line "scripts and home" "scripts/stufftarlog.txt" 3 echo show_last_line "stuff" "stuff/stufftarlog.txt" 3 echo return 0; } # end function show_status |
Listing 3 |
This function,
show_status
, is triggered when a parameter of
status
is passed on the command line. It can be used on its own or in conjunction with other commands.
This is the main part of this script. If no parameters are passed, a help message is displayed explaining the use and parameters of the script.
The worker variables
BACKUP_DESKTOP
,
BACKUP_EXTRA
,
BACKUP_HOME
,
BACKUP_LOCALHOST
,
BACKUP_RPG
,
BACKUP_STUFF
are initialised to zero here. Another variable,
$STUFFTAR_VERBOSE
, is initialised near the start of the script.
Listing 4 is where I loop through the script’s command line arguments, setting worker variables accordingly. Note if the parameter
all
is found, then a bunch of worker variables are set to 1.
For information purposes, the name of the script is echoed (
$0
) and if in verbose mode,
ls -lh
is used to show even more information about the script file. Also for information purposes, the status of worker variables is displayed.
echo $# PARAMETER\(S\), BACKUP_STUFF=$BACKUP_STUFF BACKUP_EXTRA=$BACKUP_EXTRA, BACKUP_DESKTOP=$BACKUP_DESKTOP, BACKUP_HOME=$BACKUP_HOME, BACKUP_RPG=$BACKUP_RPG, BACKUP_LOCALHOST=$BACKUP_LOCALHOST, STUFFTAR_VERBOSE=$STUFFTAR_VERBOSE, STUFFTAR_STATUS=$STUFFTAR_STATUS
For consistency, the script changes the current directory to the current user’s home directory before doing any file handling.
cd ~
Then the .tar.gz files are created – basically checking to see if a worker variable is 1 and then calling
perform_backup
to do the work.
# backup to a stuff tar if [ $BACKUP_STUFF -eq 1 ] then perform_backup "Stuff" "stuff/stufftarlog.txt" "$HOME" "stuff" fi
Similar clauses are used for the creation of the extra, rpg, desktop .tar.gz files. Backing up the key ‘home’ files is a little different:
# backup key /home/ian things e.g this backup # script to a tar file if [ $BACKUP_HOME -eq 1 ] then perform_backup "Home" "scripts/stufftarlog.txt" "$HOME" refurb scripts synclamp stufftar coder removefiles.c; fi
And localhost is used to backup key LAMP files (see Listing 5).
As its final act, if the ‘status’ option has been activated, the status of every backup file is displayed.
if [ $STUFFTAR_STATUS -eq 1 ] then show_status; fi
And that is it.
Feedback from Overload technical reviewers
This script serves as a decent example of how to back things up in a reproducible way using tar. I didn’t see any glaring errors in it, but I would comment that it is not tolerant of spaces in filenames (fixing that would require liberal use of double-quotes, and I usually find some trial-and-error is required to get this right).
The script was written by me – a bash novice. Given the importance of the data, it was tested heavily as it evolved. As I expected to use the resulting .tar.gz files from the command line, I decided that my filenames would not contain spaces.
The article as it is now is a bit specific to one use case – I think it would be more useful if it explained the ideas and techniques being used rather than presenting the details of the script itself. E.g.:
-
Interesting policy decisions like creating separate log files for each piece of work being done – it would be interesting to hear how this supports the workflow.
I thought about having a single log file, ~/stufftarlog.txt but it wasn’t flexible enough and I’d have to somehow process that log file, looking for status information about each type of backup. By having separate log files, I avoid that problem and it means I can decide to just backup certain bunches of files instead of a full-blown backup of everything.
-
How to get the last relevant line out of a log file (as the script does) – this seems more widely-applicable and a useful little nugget.
I added the function body of
show_last_line
to this article. It is quite simple and uses thetail
command. -
How to deal with command line arguments (the for loop used here looks quite convenient for simple applications like this).
Yes. With a bit of effort it can be more flexible. At the moment it handles one word parameters that act as flags or specify a certain backup to perform.
Supporting a syntax like
-f some_kind_of_flag
would be possible. I have some ideas about it, mainly involving extending the loop to set a flag ($ARGUMENT_F_EXPECTED
) when a parameter of-f
is detected and setting a flag. Then the head of the loop would need another set ofif
statements – followed by use of thecontinue
loop modifier. -
The benefits of writing a script rather than doing this manually (e.g. reduced errors and less time take)
Spot on. Being able to run a command to do all the backups I wanted, have standard filenames and contents, and walk away was crucial. That dealt with errors during creation of the backups. However, I used to transport my files to a Samsung NC10 NetBook (when it wasn’t being wiped and used to test Lubuntu pre-releases), and I noticed that I was occasionally forgetting to install the contents of newer .tar.gz files. So I needed to know when a particular folder’s files were created. This resulted in the function
show_last_line
(it can show more than one line) which was discussed earlier. When I’m working on the Samsung NC10, typing in./stufftar status
means I can see how fresh this copy of my files is.
Reference
stufftar can be downloaded from: https://sites.google.com/site/ianbruntlett/home/free-software/linux