Tar (file format)

Tar (file format)

infobox file format
name = tar
icon =


caption = GNU tar 1.16 showing three common types of Tarballs (shown in red).
extension = .tar
mime = application/x-tar
uniform type = public.tar-archive
magic = ustar at byte 257
genre = file archiver
contained by = gzip, bzip2, lzma, compress

In computing, tar (derived from "tape archive") is both a file format (in the form of a type of archive bitstream) and the name of the program used to handle such files. The format was standardized by "POSIX.1-1988" and later "POSIX.1-2001". Initially developed as a raw format, used for tape backup and other sequential access devices for backup purposes, it is now commonly used to collate collections of files into one larger file, for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures.

Design

tar's linear roots can still be seen in its ability to work on any data stream and its slow partial extraction performance, as it has to read through the whole archive to extract only the final file. A tar file (somefile.tar), when subsequently compressed using a compression utility such as gzip, bzip2, lzma or compress, produces a compressed tar file with a filename extension indicating the type of compression ("e.g.": somefile.tar.gz). A .tar file is commonly referred to as a tarball, which is usually compressed to save disk space.

As is common with Unix utilities, tar is a single specialist program. It follows the Unix philosophy in that it can "do only one thing" (archive), "but do it well". tar is most commonly used in tandem with an external compression utility, since it has no built-in data compression facilities. These compression utilities generally only compress a single file, hence the pairing with tar, which can produce a single file from many files. To ease this common usage, the BSD and GNU versions of tar support the command line options -z (gzip), and -j (bzip2) which will compress or decompress the archive file it is currently working with. Both will also extract compressed archives without requiring these options. In case of GNU tar, the (de)compression is actually performed by an external program.

Format details

A tar file is the concatenation of one or more files. Each file is preceded by a header block. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes and the extra space is zero filled. The end of an archive is marked by at least two consecutive zero-filled blocks.

A limitation of early tape drives was that data could only be written to them in 512 byte blocks Fact|date=September 2007. As a result data in tar files is arranged in 512 byte blocks.

The tar command can write data to tape in chunks of several 512 byte blocks, to minimize the wasted gaps in the tape between write operations. Each chunk is called a record. The user can specify a blocking factor, which is the number of blocks per record. The end of an archive gets padded with additional blocks of zeros to make its total size equal a multiple of records, regardless of whether tape is used as the storage medium.

File header

The file header block contains metadata about a file. To ensure portability across different architectures with different byte orderings, the information in the header block is encoded in ASCII. Thus if all the files in an archive are text files, then the archive is essentially an ASCII file.

The fields defined by the original Unix tar format are listed in the table below. When a field is unused it is zero filled. The header is padded with zero bytes to make it up to a 512 byte block.

Example

The example below shows the ASCII dump of a header block from a tar file created using the GNU tar program. It was dumped with the od program. The "ustar" magic string followed by two spaces can be seen, meaning that the tar file is in GNU format, partially incompatible with the true UStar standard (in POSIX.1-1988), which has the signature "ustar" followed by a NUL character. [The OpenBSD 3.7 tar does not have the 2 space characters after "ustar". They are "nul" characters. GNU tar by default creates incompatible archives in case that the archives contain path names that are longer than 100 characters and GNU tar writes an incorrect size field in case a sparse file has more than 4 holes.]

0000000 e t c / p a s s w d nul nul nul nul nul nul 0000020 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0000140 nul nul nul nul 0 1 0 0 6 4 4 nul 0 0 0 0 0000160 0 0 0 nul 0 0 0 0 0 0 0 nul 0 0 0 0 0000200 0 0 4 1 3 5 5 nul 1 0 1 5 5 0 6 1 0000220 1 0 5 nul 0 1 1 5 5 6 nul sp 0 nul nul nul 0000240 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0000400 nul u s t a r sp sp nul r o o t nul nul nul 0000420 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul 0000440 nul nul nul nul nul nul nul nul nul r o o t nul nul nul 0000460 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 0001000

Naming of tar archive files

Like most Unix utilities, tar doesn't require any particular filename suffix in order to recognize a file as an archive. Conventionally, uncompressed tar archive files have names ending in ".tar". If an archive is compressed with an external tool, the compression program adds its own suffix as usual, resulting in filename endings like ".tar.Z", ".tar.gz", and ".tar.bz2".

Names like those can't exist on MS-DOS due to its 8.3 filename limitations, so a second set of conventions appeared for storing compressed tar archives on an MS-DOS file system:

* ".tgz" is equivalent to ".tar.gz".
* ".tbz" (or ".tbz2" or ".tb2") is equivalent to ".tar.bz2".
* ".taz" is equivalent to ".tar.Z",

These shortened filename suffixes are still in common use.

tarbombs

"tarbomb" is derogatory hacker slang used to refer to a tarball containing files that extract to the working directory instead of into a separate directory of their own (such a separate directory ideally being a subdirectory of the working directory). This will sometimes create problems by overwriting files of the same name in the working directory. It is almost always an inconvenience to the user, who is obliged to identify and delete a number of files scattered throughout the directory's contents. Such behavior is considered bad etiquette on the part of the archive's creator.

A related problem is the use of absolute paths when creating tarballs. Files extracted from such tarballs will often be created in unknown locations outside the working directory, and like a tarbomb have the potential to overwrite existing files. On the other hand, an attempt to extract files to a location for which the current user does not have write access will fail.

A user can avoid both of these problems by issuing the command

tar -tf archive.tar

This variant of the tar command will not extract any files, but will display the paths of all files that would be created by the extraction process. The user thus has the opportunity to create a container directory for the tarball's contents, and to assess the contents of any absolute paths, before carrying out the extraction.

tarpipe

A tarpipe is the process of creating a tar archive in stdout and then, in another directory, extracting the tar file from the piped stdin. This is a useful way to copy directories and subdirectories, especially if the directories contain special files, such as symlinks, and character or block devices.

tar -cf - "${srcdir}" | ( cd "${destdir}" && tar -xvf - )

Remote tarpipe

A remote tarpipe or "ssh tarpipe" uses the same methodology of a tarpipe, but instead of simply changing to a new directory on the local host to extract the tar, the user logs into a remote host in order to execute the tar extraction.

tar -cf - "${srcdir}" | ssh ${user}@${remotehost} "cd ${destdir} && tar -xvf -"

To reverse the process, download instead of upload, the user logs into a remote host in order to create the archive, the tar extraction takes place on the local host.

ssh ${user}@${remotehost} "tar -cf - ${srcdir}" | tar -C ${destdir} -xf -

tarpit

tarpit is a term to describe a method of revision control where a tar is used to capture the state of development of a software module at a particular point in time. The use of a tarpit typically loosely mirrors the use of a revision control software tag and branching through the use of descriptive names.

See also

* List of archive formats
* List of file archivers
* Comparison of file archivers
* List of Unix programs

Notes

External links

* [http://www.linfo.org/tar.html The tar Command] by The Linux Information Project (LINFO)
* [http://www.mkssoftware.com/docs/man4/tar.4.asp Detailed information on Tar and USTAR file headers]
* [http://www.openbsd.org/cgi-bin/man.cgi?query=tar tar(1) man page] via OpenBSD


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • tar (file format) — tar GNU tar 1.23 showing three common types of Tarballs (shown in red). Filename extension .tar Internet media type application/x tar …   Wikipedia

  • LBR (file format) — The .LBR file format was an archive file format used on CP/M and MS DOS operating systems during the early 1980s. .LBR files were created by the LU program; later programs like NULU arrived for .LBR creation, and many tools such as LT and QL were …   Wikipedia

  • ZIP (file format) — unzip redirects here. For the program, see Info ZIP. ZIP Filename extension .zip .zipx (newer compression algorithms) Internet media type application/zip Uniform Type Identifier com.pkware.zip archive Magic …   Wikipedia

  • Deb (file format) — Infobox file format name = Debian package icon = caption = The icon for .deb files on ubuntu extension = .deb mime = application/x deb type code = uniform type = magic = owner = Debian released = latest release version = latest release date =… …   Wikipedia

  • Chemical file format — This article discusses some common molecular file formats, including usage and converting between them. Contents 1 Distinguishing formats 2 Chemical Markup Language 3 Protein Data Bank Format 4 G …   Wikipedia

  • Free file format — A free file format is a file format whose full specification is freely available and for which there are no restrictions (e.g. legal or technical) on its use. [cite web url=http://www.linfo.org/free file format.html title=Free File Format… …   Wikipedia

  • Tar (disambiguation) — Tar may refer to:* Tar, a viscous organic black liquid * Tar (tobacco residue), the partially combusted particulate matter produced by smoking * Tar (band), a 90s indie rock band * Tar (drum), a drum from North Africa and the Middle East * tar… …   Wikipedia

  • WAR (file format) (disambiguation) — In computing, a WAR file (short for Web ARchive) may refer to: * Sun WAR (file format), a JAR file used to distribute a collection of JavaServer Pages, servlets, Java classes, XML files, tag libraries and static Web pages (HTML and related files) …   Wikipedia

  • KDE WAR (file format) — Infobox file format name = Web ARchive extension = .war mime = application/x webarchive owner = The KDE Team creatorcode = genre = containerfor = Web pages containedby = extendedfrom = extendedto = In computing, a WAR file (short for Web ARchive) …   Wikipedia

  • deb (file format) — Debian package The GNOME icon for deb files Filename extension .deb Internet media type application/x deb Developed by …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”