ISiBackup is not magic. It is, more or less, a wrapper script for collection, compression and encryption tools. ISiBackup runs in five phases:
Initialisation, configuration file and command line options interpretation
Directory selection
For each directory
File selection
Collection
Compression
Encryption
Statistics
The initiaisation sequence is approximately:
Primary setting of a lot of internal constants, e.g. network, host and date constanst, paths to expected programs, maximum file and directory sizes, names for temporary files etc.
Determination of one of the four main commands:
--backup starts a backup (it needs more arguments)
--copy starts the copying of backup data to another system
--help shows the command line help and exits
--version shows the program version and exits
--backup and --copy can be combined in order to have the newly created backup being pushed to another machine.
Fetching all the other command line parameters
Detailed initialization in function init_program:
sourcing general configuration file /etc/isibackup/isibackup.conf
sourcing the set configuration file /etc/isibackup/{Set}/set.conf, where the set hast been given with --set {Set}
Check if there is another instance of the same set running; if so, abort.
Initialize the log files (main and error)
Start the actual backup process
This takes place inside the do_backup function.
Determine the input pattern for the Directory selection. The directories can be selected using the configuration file /etc/isibackup/{Set}/include_dirs.lst. Alternatively, the fstab can be used as the the source of all directories to backup (this is used by the set all), or the output of the mount command can be used.
Is the selected compression method (CMD_PACK) avaliable as command (e.g., is zip etc. installed?)
Is the selected encryption method (CMD_CRYPT) avaliable as command (e.g. is gpg etc. installed?) and is the key needed for encryption (CRYPT_KEY) available?
Determine the date of the last full backup in order to be able to do a differential backup
Make sure the target directory is avaliable and can be written
Record the settings and backup start time. ISiBackup records most of the parameters it was called with so that later identification of those parameters is possible.
Create the list of directories to be backed up. This is a loop through all the mounted volumes, or those given in fstab, or those directories given in include_dirs.lst, using find. In case of the include_dirs.lst list, it is easily possible to use shell patterns to select directories.
The find results in a list of all subdirectories of all those volumes/directories. This is saved in a file ending on .out. Additionally, we not just search for the entries themselves, but due to restrictions (or features?) of bash, we also search for them qouted. That way, we get directory names like "My Documents", wheras a find for My Documents equals a find for My, followed by a find for Documents. These two will probably not render any files. By doing two searches, double entries are probable, and get removed immediately.
The next step is the removal of the directories from exclude_dirs.lst. This is done using grep, hence any regular expression is allowed in these files (but not shell patterns).
The whole process renders five files containing directory names:
.in: The list of patterns used as input
.out: The result of the find before the exclusion
.full.sorted: The same list, sorted and without double entries
.filtered: The final list of directories to backup
.skipped: The rest, i.e. those directories that will not get backuped
Finally, a special "directory list" item is appended to the list. This is due to the fact that we do not actually backup directories, but files. Hence, while we are able to preserver all the files's ownerships and permissions, we lose that for all directories. So this special entry later triggers the creation of a backup of all directories, including their ownerships and permissions, while excluding all the files inside them. This DIRLIST_ENTRY defaults to ###isibackup-dirlist###
Now the backup of the individual directories starts, using the steps of "file selection", "collection", "compression" and "encryption".
File selection determines what files to backup.
In case of a differential backup, we have to backup only files that have changed after the last full backup. As file selecion uses the find comamnd as well, we add the parameters of -newer {Date}, where {Date} is the date of the last full backup (which was determined during intialisation).
Resetting the statistic counters allowes for a progress indication later.
Now the loop through the selected directories begins.
First, the current statistic values are fetched and progress is shown.
Source and target directory are stripped of double slashes and characters incompatible with the VFAT filesystem are converted (recodeVFATChars) in order to prevent later write errors [This is due to the fact that at IMSEC, early target directories were on VFAT filesystems]. The only character currently being replaced is the colon (:). The replacement is the sequence %1A.
The target directory is created. Before doing so, it checked wheter the name fits inside the name length restrictions. This is given as 220 in MAX_PATHLEN, but may be overridden inside the configuration files. Be aware that by placing the backup directory inside the tree somewhere, the path length grows by the number of characters in that path! The limit is not as far away as it may seem. Paths longer than this limit will just be skipped and will not be part of the backup (however, a warning will be displayed).
The function createFileList is used to determine the files to be backuped inside this directory.
In case the list of include file patterns (from include_files.lst) is empty, all files in the directory will be found, otherwise a loop thorugh all the include patterns makes sure just those files are selected that match. In any case, the "-newer" restrictino from above applies in case of a differential backup.
The next step is to exclude all files from the file exclusion list in exclude_files.lst. If no pattern was defined, no exclusion takes place, i.e. the file selection remains the same.
If the file selection renders an empty list (no inclusion, or too many exclusions, or no newer files in case of differential backup, the actual process of backing up the files is skipped, and an empty target directory remains (it will be deleted later).
The normal mode of operation is to pack the files inside the directory into one single archive, which is called "collect". The program tar is a typical collector.
But there is a a second operating mode called ""separate". When the "collect" runs into size limitations (imagine a directory with two or more files of 1.5 GB each), the mode is switched from "collect" to handling a per-file basis (mode "separate"), that is, each file is being put into its own archive instead of being collected with the others in this directory. So in "collect" we end up with one file per directory, whereas in "separate", we get as many files as there were before. The definition of "site limitations" is given in the constant MAX_FSFILESIZE, which defaults to 2 GB, but can be overridden in any of the two configuration files.
To check if we run into size trouble, we first need to count the files and sum their sizes. But let's look into that step by step.
As explained above, the DIRLIST_ENTRY is used to backup the directory information. This needs space as well and goes into the size calculations with DEF_PACKEDBLOCKSIZE per directory.(DEF_PACKEDBLOCKSIZE defaults to 1)
The the actual size calculation is done in the function countFiles. Also here, we check if there are files among them which the selected packer cannot pack (e.g. zoo cannot handle symbolic links), and to do so, we filter the file list again by file type and skip the files that cannot be handled (a warning is issued).
The file size calculations erly on the output of stat. In order not to stress the internal bash arithmetics too much, the size is rounded to the kilobyte. Anyway, integrity checks for arithmetic overflow have been added, and detecting such an error does not break the backup process (anymore); it just sets the maximum size requirement for the target directory (which is MAX_FSFILESIZE).
So if the target path length is short enough (see restriction futher up), the calculated size determines if we run in "collect" or "separate" mode. /para>
Special attention is needed when we backup the root directory. Normally, the name of the individual achives is the same as the directory that has been backed up insied (except for VFAT charater conversions). In the case of the root directory, there is no name, so we call that one "rootdir". This must lead to trouble if there actually happenes to be a directory called "rootdir", but this is an uncommon case that has not been handled yet.
The next step is to loop over all the output files. This is kind of a trick, as in "collect" mode there is just one output file, named like the directory. Just in the "separate" mode there several files.
The target file name gets recoded to prevent any characters not compatible with the VFAT filesystem.
![]() | The availble space on the target is checked. The conservative assumption is that when there is less space left on the target than the total size of the (uncompressed) directory, space is too tight, and the backup aborts immediately. There is no way to override thaqt, but it has proven to be a good indicator of a nearly full backup drive, and hence this is a good thing as it explicitely requires measures. |
Next, the target file extension is determined. This follows the usual convention that each processing tool adds its extension to the file (e.g. using a sequence of cpio and bzip2 results in the target file having the extension of .cpio.bz2).
There is another restriction that can lead to abort of the ISIBackup script: Normally, file operations are not executed on the source drive nor on the target drive, but on a temporary directory that defaults to /var/tmp. As this temporary directory may need to hold the "collected" file (e.g. collected by cpio) as well as the compressed file (e.g. bzip2), and as compression may not reduce the size, the temporary directory must have 220% of the total size of the directory contents available (100% of the added source file sizes for collection, up to 100% for the compressed file and a 20% reserve). In case that fails, it checks whether that much space is available in the target directory, in which case file operations are done there. If neither has that much space, ISiBackup aborts. While doing file operations on the target drive works, it may be very low in performance (e.g. on a network-mounted target directory, or on a removable media target directory).
Next, the name of then encrypted target file is determined. If encryption is enabled and set to gpg, for example, then the extension .gpg will be added to the resulting file name.
As creating a collected, compressed, and encrypted file can take up some time, that time can be saved in case there already is a target file with the same content. This is called "in-place refresh of a previously created backup". But how can one determine if the contents of an encrypted file are the same as the ones in the source directory? One can't, so we need another trick here. When backing up files, ISIBackup sets the file date to the date of the newsest file included in the backup (e.g. using the zip -o option). If all of the file in the source are older than the date of the pre-existing file, backing it up the directory can be skipped as the target still holds a complete backup file. If there is a newer file, and if the file was not encrypted, then some of the compression programs offer an update open (such as zip -u) which can be used to further decrease the time needed for backup. As a result, we have a "skip", an "update" and a "create" mode for the abckup file.
The next step is to catually produce the collected file (function createPackedArchive). It has to be noted that some programs are just "collectors", susch as tar and cpio, while others are just packers, like zip and bzip2, while even others have both functions integrated in a way that one can hardly separate them, such as zip. Here, collection and compression are done in two integrated steps, resulting in a collected, compressed archive. Additionally, the size of the input directory is recorded for statistics.
The collected, compressed file is then encyrpted (function createCryptedFile). Encryption is done to the configured backup key. Independently of the fact if encryption is enabled or disabled, the size of the resulting output file is recorded for statistics.
If file operations were not on the target directory, the file is transferred there from the temporary working directory.
This concludes the actual backup process, which is repeated for each output file, and for each input directory. The rest ist outputting the statistical information, writing the termination messages to log and stdout, and cleaning up the various temporary files that were used. Also, a time stamp is used to record the date of the backup formally; this is later needed for differential backups.