Newsflash
Well, you've found us! Welcome to the ultimate tech support web site. It may not look like much, but it has great power. Just post a question to the forum or open a help desk ticket and get a perfect solution. What else do you need from tech support?
Backup Your Linux Desktop or Laptop PDF E-mail

Here at Harmony Service, we assume you GNU/Linux users out there are already highly experienced technically savvy power users who know all the ins and outs of Linux. So why write a tutorial on something as simple as a backup? Well, let's just say that we all need a reminder to backup, and, heck, maybe that new Linux user out there has become so absorbed with all things Linux that he's forgotten to backup his great new works. Besides, we'll show you an easy, secure, safe and complete backup solution.

15 years in the technical profession has shown us just about every sort of data loss. Working in a college town has provided even more immediate experience. We've lost count of how many dissertations, term-papers, multi-media projects, and overdue homework assignments we've been asked to recover from dead media. Although we can recover just about anything, it's usually expensive and not 100% sure. We love to give the "computers are worthless without a good backup" speech to grieving customers.

Here are some facts about computers:

    1. Hard drives are likely to fail.
    2. Writable CD's and DVD's get scratched and wear out.
    3. Zip disks are just about the least reliable hardware ever invented.
    4. Tape drives are expensive, slow and difficult to recover.
    5. Floppy disks are too small.
    6. Flash drives lose their formatting and get lost.

So, if everything in our computing world eventually dies, whatever shall we do to keep our valuable data among the living? The simple answer: backup to a remote host using rsync and ssh. This answers all local dilemmas: no need to buy hardware, someone else is responsible for hardware integrity, data is off-site and is quickly available for recovery to any system anywhere. The possible drawbacks to remote backups are speed and space. The speed depends on your internet connection and the space depends on your remote host. If you have 200 gigs of music to back up, you are better off using a large external firewire hard drive. But for 95% of your personal data backup needs, remote backups will work perfectly. Of course, Harmony Service always recommends redundant backups of important data on as many types of media as possible.

Requirements:

To get started, we first assume you have an internet connection and a remote host. A high-speed internet connection is best for all upload and download tasks but certainly not necessary. Because our backup strategy can run in the background, as long as your computer is on and connected to the internet, the backup will happen. A remote host is any other computer providing log-in and storage services. In this tutorial, we use our web host as the remote host. The advantages to using your web host as your backup destination are speed (they'll have the fastest connection), reliability (they must honor their "always on" promises), redundancy (they keep their own backups of your domain), availability (you can recover your data anywhere there is an internet connection), and security (if you have a competent web host, they'll have cutting edge security measures installed by default). If you follow the Harmony How-To titled: Get Personalized Email and a Website, you'll have everything you need already setup.

Now, on to the how-to:

Rsync is a command line utility traditionally used in synchronizing files between two computers, but rsync can also be used as an effective backup tool. This free and powerful tool is simple enough for anyone to use on their Linux desktop.

First, make sure you have rsync by entering rsync --version at the command line. If you see rsync version 2.X.X protocol version X, you have it. If you see "command not found" or a similar message, you need to download and install rsync. Use your distribution's package management system to do this, or else download and install the source from the rsync Web site. Make sure your version is greater than 2.6.0.

Now it's time to consider what to back up, to where, and when?

What should be backed up? Do you want to run a full system or a partial system backup? A full system backup creates a second copy of everything on your hard drive. This has the advantage of providing a means to quickly restore your system to the exact state it was in when you made the backup. Full system backups take a long time to complete, take up a lot of disk space, and are often unnecessary. When you run full system backups, make sure to use rsync's --exclude parameter. Certain directories, such as /proc, should not be backed up. See the backup script below as an example.

Partial system backups are faster and more space-efficient, because you copy only important hand-selected data. For instance, you may want to backup only the /home directory, which contains users' documents, music, and program settings. The operating system files, such as those under /usr (programs) and /var (log files, email, etc.) can be easily reinstalled and don't need to be backed up.

Where should it be backed up? Your imagination is the limit when it comes to rsync's backup destination options. The scope of this tutorial, however, is limited to a remote web host.

When should it be backed up? Automated daily backups are a good choice for most Linux desktop scenarios. You can use Linux's built in scheduler, the cron daemon, with shell scripts to automate your backups.

Using rsync

The basic implementation of rsync is: rsync -a source/ target/. This command copies the source directory to the target as if you were executing cp -a source/. target/. Unlike cp, rsync uses the rsync algorithm to check for differences between source and destination files. Since it copies only new changes, a technique known as incremental backup, rsync provides a very fast method for updating your backups.

Make exact copies using the --delete flag. You can apply the --delete flag when making system backups, which causes rsync to delete any files found in the target that are not present in the source. This ensures that the target is an exact copy of the source, so that if you delete an unwanted document, it is also removed from your backup. Rsync preserves files found in the target and not in the source by default, allowing for multiple sources to be added to a single target destination. To get around this behavior, use a command like the following: rsync -a --delete source/ target/

Keeping multiple backups. It is a good idea to keep a few days' worth of backups so that you can return to a particular day if necessary. You can do this by rotating the oldest backup to the current one and updating it using rsync. The following script executes a single day to day backup. You'll find the code for a 3 day rotating backup system at the end of the tutorial.

-----------------------------------------------------------------------------

#!/bin/sh
# Modified code from Brice Burgess -
# backup.sh -- backup to a local drive using rsync

# Directories to backup. Separate with a space. Exclude trailing slash!
SOURCES="/home/wendy /home/daisy/.thunderbird /var/mail"

# Directory to backup to. This is where your backup(s) will be stored before compressing and sending it to the remote host. Make this a directory inside your backup directory for easy cataloging.
# Exclude trailing slash!
TARGET="/backups/current"

# Your EXCLUDE_FILE tells rsync what NOT to backup. Leave it unchanged if you want
# to backup all files in your SOURCES. If performing a FULL SYSTEM BACKUP, ie.
# Your SOURCES is set to "/", you will need to make use of EXCLUDE_FILE.
# The file should contain directories and filenames, one per line.
# An example of a EXCLUDE_FILE would be:
# /proc/
# /tmp/
# /mnt/
# *.SOME_KIND_OF_FILE

# Comment out the following line to backup everything in the SOURCES variable.
EXCLUDE_FILE="/path/to/your/exclude_file.txt"

# Comment out the following line to disable verbose output
VERBOSE="-v"
###########################

if [ ! -x $TARGET ]; then
echo "Backup target does not exist or you don't have permission!"
echo "Exiting..."
exit 2
fi

echo "Verifying Sources..."
for source in $SOURCES; do
echo "Checking $source..."
if [ ! -x $source ]; then
echo "Error with $source!"
echo "Directory either does not exist, or you do not have proper permissions."
exit 2
fi
done

if [ -f $EXCLUDE_FILE ]; then
EXCLUDE="--exclude-from=$EXCLUDE_FILE"
fi

echo "Sources verified. Running rsync..."
for source in $SOURCES; do

# Create directories in $TARGET to mimic source directory hierarchy
if [ ! -d $TARGET/$source ]; then
mkdir -p $TARGET/$source
fi

rsync $VERBOSE --exclude=$TARGET/ $EXCLUDE -a --delete $source/ $TARGET/$source/

done

tar -zcpf /backups/currentbackup.tar.gz /backups/current

# Make sure you have a directory on your web host named "backup."
cat /backups/currentbackup.tar.gz |ssh -l sshusername sshhostname "cd backup; cat > currentbackup.tar.gz"

exit 0

-----------------------------------------------------------------------------

Copy and paste the script into your favorite text editor (we've used vi), save it somewhere in your $PATH (/bin or /usr/local/bin) as backup.sh, and make it executable with the command chmod +x backup.sh. Change the SOURCES variable to the paths you'd like to backup. Be considerate on how large your backup will be compared to how much space you have available on your host. You should have at least 1 gig on your host (good for most backups). Change the TARGET variable to the path where you'd like your backups to be saved on the local computer before compressing and sending to the remote host. Make this a directory inside your backup directory for easy cataloging.

You'll notice that we are using the tar command at the end of the script to compress the backup and then the cat and ssh commands to send the compressed backup to the remote host. We do this for the best compression and security. Before you make your first backup, it's important to make sure your ssh connection to the web host is working. Make sure you have your ssh (or FTP) username, password, and hostname URL (usually your domain name). If you are hosting at 1and1.com, go to your web control panel and check the Web Space/Access SSH area to get this info. Then execute the command ssh -l sshusername hostnameURL. Enter your password when asked. You should then see a welcome screen with a new command prompt. You are now inside your web host at their command line! Neat! You can automate the ssh login process if you want to execute automatic backups. This is best if you want to have your backup run at a specific time every day; otherwise you'll have to enter your ssh password every time you backup.

To be able to logon to another server without being prompted for your password, you need to generate a key that will be trusted by the remote host, where your backups will be sent to. To accomplish this, follow the following steps as the user you will use (harmony here).

ssh-keygen -b 2048 -t dsa

You will then be prompted for a file name. Leave it as the default by simply pressing "Enter".

Generating public/private dsa key pair.
Enter file in which to save the key (/home/harmony/.ssh/id_dsa):

The last step of the key creation is the passphrase. Since the purpose of this is to not enter a password, hence being able to automate the backup, just hit "Enter" twice, leaving them blank.

Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/harmony/.ssh/id_dsa.
Your public key has been saved in /home/harmony/.ssh/id_dsa.pub.
The key fingerprint is:
a6:84:5a:a8:cf:ff:31:38:21:85:ca:46:93:88:7a:50 harmony@localmachine

This just created 2 files in the user's home directory: ~/.ssh/id_dsa (the private key) and ~/.ssh/id_dsa.pub. The id_dsa.pub is your public key, which you share with the remote host. The id_dsa is your private key, and this is only for you. Do not lose it or share it with anyone, as this is your passkey! Make sure the file is not readable by anyone: chmod 600 ~/.ssh/id_dsa. Anyone having a copy of this key could steal your identity and login to this host as you. It is not any more dangerous to use this method as to use a traditional password, but I will not enter into a debate here.

Now upload the public key to your host with ssh,:

cat ~/.ssh/id_dsa.pub |ssh -l sshusername sshhostname "cat > id_dsa.pub"

log into your web host with ssh and create the .ssh directory if it isn't already there,:

mkdir .ssh

create the authorization key,

cat id_dsa.pub >> .ssh/authorized_keys

delete the public dsa file,

rm id_dsa.pub

and protect the new authorization file.

chmod 600 .ssh/authorized_keys

Log out by typing exit and then test the connection using:

ssh -l sshusername sshhostname

You should be logged in automatically without being prompted for a password! Log out again and move on.

You're now ready to make your first backup. Type backup.sh to start the process. The script can take some time to complete the first time you run it, because rsync must make a copy of each file rather than update just changed files. Later runs will complete much faster. So if your cursor blinks under the command for a long time, just be patient. If you notice something is wrong, press Ctrl-C to stop the process. Upon completion of the script, you should have a replica of your SOURCES in your TARGET and a gzip file of it on your host. Have a look.

Automating the process. Assuming backup.sh ran successfully, and that you now have a copy of your important files on the host, it is time to automate the process. We'll use Linux's built-in scheduler, the cron daemon, to do this. The cron daemon uses "crontab" files to schedule tasks. The system's main crontab file can be accessed by becoming the superuser (either by logging in as root or typing su at the command line) and executing crontab -e.

You'll want to schedule a time for your backup.sh to execute. Crontab syntax is:

[minute] [hour] [day] [month] [dayofweek] [command]

Thus, adding the line:

0 4 * * * /path/to/backup.sh

will execute backup.sh at 4:00am every day. When you're finished adding the line, save the file and exit.

That's all there is to it. Rsync is a very powerful tool, and you should pat yourself on the back for applying some of its potential. In the future we'll cover how to backup to a remote machine, show examples on how to keep multiple backups in rotation, and even run rsync within Microsoft Windows. In the meantime, check out Mike Rubel's excellent resource on rsync to learn how to perform daily and even hourly backups. If you have any problems with this script, please send us an email or log in and open a help desk ticket.

If you have enough space on your remote host, you don't have to include the compression part of the script. If you backup everything "as is" to the server (the first run of the script will take a long time), rsync will then only backup new data from your backup source (the next runs of the scripts will be much quicker). To do this, copy and paste this script into your favorite text editor, name it "backup.sh (or anything)," and chmod +x it. Change the source paths to match your backup directories. Change the destination path to match a backup directory on your remote host. Use a forward slash at the end of the source path if you want to only backup the contents of that directory. Omit the trailing slash to backup the contents and the directory to the remote host.

-----------------------------------------------------------------------------

#!/bin/sh

rsync -av --delete -e ssh /path/to/source /another/path/to/source sshusername@remotehostURL:~/destination

exit 0

-----------------------------------------------------------------------------

Here is the script for multiple backup rotations. The modifications will keep a designated number of backups in the target directory named after the date they were executed (YYYY-MM-DD_Hour-Minute).

-----------------------------------------------------------------------------

#!/bin/sh
# Modified code from Brice Burgess -
# multi_backup.sh -- backup to a remote host using rsync.
# Uses hard-link rotation to keep multiple backups.

# Directories to backup. Separate with a space. Exclude trailing slash!
SOURCES="/home/wendy /home/daisy/.thunderbird /var/mail"

# Directory to backup to. This is where your backup(s) will be stored. No spaces in names!
# :: NOTICE :: -> Make sure this directory is empty or contains ONLY backups created by #this script and NOTHING else. Exclude trailing slash!
TARGET="/backup/current"

# Set the number of backups to keep (greater than 1). Ensure you have adequate space.
ROTATIONS=3

# Your EXCLUDE_FILE tells rsync what NOT to backup. Leave it unchanged if you want
# to backup all files in your SOURCES. If performing a FULL SYSTEM BACKUP, ie.
# Your SOURCES is set to "/", you will need to make use of EXCLUDE_FILE.
# The file should contain directories and filenames, one per line.
# A good example would be:
# /proc
# /tmp
# *.SOME_KIND_OF_FILE. Comment out the following line to backup everything in the SOURCES variable.
EXCLUDE_FILE="/path/to/your/exclude_file.txt"

# Comment out the following line to disable verbose output
VERBOSE="-v"

#######################################
########DO_NOT_EDIT_BELOW_THIS_POINT#########
#######################################

# Set name (date) of backup.
BACKUP_DATE="`date +%F_%H-%M`"

if [ ! -x $TARGET ]; then
echo "Backup target does not exist or you don't have permission!"
echo "Exiting..."
exit 2
fi

if [ ! $ROTATIONS -gt 1 ]; then
echo "You must set ROTATIONS to a number greater than 1!"
echo "Exiting..."
exit 2
fi

#### BEGIN ROTATION SECTION ####

BACKUP_NUMBER=1
# incrementor used to determine current number of backups

# list all backups in reverse (newest first) order, set name of oldest backup to $backup
# if the retention number has been reached.
for backup in `ls -dXr $TARGET/*/`; do
if [ $BACKUP_NUMBER -eq 1 ]; then
NEWEST_BACKUP="$backup"
fi

if [ $BACKUP_NUMBER -eq $ROTATIONS ]; then
OLDEST_BACKUP="$backup"
break
fi

let "BACKUP_NUMBER=$BACKUP_NUMBER+1"
done

# Check if $OLDEST_BACKUP has been found. If so, rotate. If not, create new directory for this backup.
if [ $OLDEST_BACKUP ]; then
# Set oldest backup to current one
mv $OLDEST_BACKUP $TARGET/$BACKUP_DATE
else
mkdir $TARGET/$BACKUP_DATE
fi

# Update current backup using hard links from the most recent backup
if [ $NEWEST_BACKUP ]; then
cp -al $NEWEST_BACKUP. $TARGET/$BACKUP_DATE
fi
#### END ROTATION SECTION ####

# Check to see if rotation section created backup destination directory
if [ ! -d $TARGET/$BACKUP_DATE ]; then
echo "Backup destination not available. Make sure you have write permission in TARGET!"
echo "Exiting..."
exit 2
fi

echo "Verifying Sources..."
for source in $SOURCES; do
echo "Checking $source..."
if [ ! -x $source ]; then
echo "Error with $source!"
echo "Directory either does not exist, or you do not have proper permissions."
exit 2
fi
done

if [ -f $EXCLUDE_FILE ]; then
EXCLUDE="--exclude-from=$EXCLUDE_FILE"
fi

echo "Sources verified. Running rsync..."
for source in $SOURCES; do

# Create directories in $TARGET to mimic source directory hierarchy
if [ ! -d $TARGET/$BACKUP_DATE/$source ]; then
mkdir -p $TARGET/$BACKUP_DATE/$source
fi

rsync $VERBOSE --exclude=$TARGET/ $EXCLUDE -a --delete $source/ $TARGET/$BACKUP_DATE/$source/

done

tar -zcpf /backups/currentbackup.tar.gz /backups/current

# Make sure you have a directory on your web host named "backup"
cat /backups/currentbackup.tar.gz |ssh -l sshusername sshhostname "cd backup; cat > currentbackup.tar.gz"

exit 0

< Previous   Next >

© Copyright 2006 ::HarmonyService:: All rights reserved