Backing up files into archives and recovering them with tar

Last revision July 20, 2004

Additional topics:

tar is an archiving program. It takes a list of files or directories and copies them into one long file in a special packed format, from which individual files or all files can later be recovered. tar does not compress the data as it creates the archive. You can compress the archive later (or in a pipeline with tar) using the compress or gzip compression programs.

tar was originally designed to archive files to tape. Now it is more commonly used to archive a group of files into a single disk file which can then be easily transferred over the Internet.

You can even pipe two tar commands together (one writing an archive which is piped to the other to read) in order to move an entire directory tree to a different disk while preserving dates, ownership, etc. Using the cp command to move directory trees like this will not preserve all the date and owner information.

Remember that tar creates a specially formatted archive file; the individual files contained within the archive cannot normally be recovered by any other program.

All versions of the tar program are not exactly alike. If you want to create a tar archive to be read on another computer, use only the plain options shown here. Better yet, if the other computer has the "GNU tar" version, which is a public-domain version from the Free Software Foundation that works identically on all systems, use that. GNU tar is also good at reading the slightly varying formats of archive files created on different Unix systems. The command for accessing GNU tar on pangea is simply gtar. This program, like all GNU programs, is documented with the info program, which is an on-line help reader. Type

info info

for information about this help system, and =

info tar

for information about the GNU tar program. The rest of this section describes the basic Unix tar program, which is documented with man tar.

WARNING: the normal Unix tar program cannot be used to archive directory trees that are deeply nested with many subdirectories within subdirectories. This is because it puts a limit of 100 characters on the length of the complete pathname of any file to be archived. If the complete pathname is longer than 100 characters, the file is skipped and not archived.

If you need to archive a directory tree that is deeply nested, or has files with very long filenames, you should use the GNU tar program instead (gtar on pangea), which does not have an arbitrary limit on pathname length.

Comments or Questions?