Pipeline (Unix)


Pipeline (Unix)

In Unix-like computer operating systems, a pipeline is the original "software pipeline": a set of processes chained by their standard streams, so that the output of each process ("stdout") feeds directly as input ("stdin") of the next one. Each connection is implemented by an anonymous pipe. Filter programs are often used in this configuration. The concept was invented by Douglas McIlroy for Unix shells and it was named by analogy to a physical pipeline.

Examples

imple example

ls -l | less

In this example, ls is the Unix directory lister, and less is an interactive text pager with searching capabilities.The pipeline lets the user scroll up and down a directory listing that may not fit on the screen.

Pipelines ending in less (or more, a similar text pager) are among the most commonly used.They let the user navigate potentially large (or infinite) amounts of text, which otherwise would have scrolled past the top of the terminal and been lost.Put differently, they relieve programmers from the burden of implementing text pagers in their applications: they can pipe output through less, or assume that the user will do so when needed.

Complex example

Below is an example of a pipeline that implements a kind of spell checker for the web resource indicated by a URL. An explanation of what it does follows. (Some machines have /usr/share/dict/words instead.)

curl "http://en.wikipedia.org/wiki/Pipeline_(Unix)" | sed 's/ [^a-zA-Z ] / /g' | r 'A-Z ' 'a-z ' | grep ' [a-z] ' | sort -u | comm -23 - /usr/dict/words

*Note: The character "" is used to place all six lines into a single command line.
*First, curl obtains the HTML contents of a web page (could use wget on some systems).
*Second, sed removes all characters that are not spaces or letters from the web page's content, replacing them with spaces.
*Third, tr changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line).
*Fourth, grep includes only lines that contain at least one lowercase alphabetical character (removing any blank lines).
*Fifth, sort sorts the list of 'words' into alphabetical order, and the -u switch removes duplicates.
*Finally, comm finds lines in common between two files, -23 suppresses lines unique to the second file, and those that are common to both, leaving only those that are found only in the first file named. The - in place of a filename causes comm to use its standard input (from the pipe line in this case). This results in a list of "words" (lines) that are not found in /usr/dict/words.
*The special character "|" tells the operating system to pipe the output from the previous command in the line into the next command in the line. That is, the output of the curl command is given as the input of the sed command.

Pipelines in command line interfaces

Most Unix shells have a special syntax construct for the creation of pipelines. Typically, one simply writes the filter commands in sequence, separated by the ASCII vertical bar character "|" (which, for this reason, is often called "pipe character" by Unix users). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of buffer storage).

Error stream

By default, the standard error streams ("stderr") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the console. However, many shells have additional syntax for changing this behaviour. In the csh shell, for instance, using "|&" instead of "| " signifies that the standard error stream too should be merged with the standard output and fed to the next process. The Bourne Shell can also merge standard error, using 2>&1, as well as redirect it to a different file.

Pipemill

In the most commonly used simple pipelines the shell connects a series of sub-processes via pipes, and executes external commands within each sub-process. Thus the shell itself is doing no direct processing of the data flowing through the pipeline.

However, it's possible for the shell to perform processing directly. This construct generally looks something like:command | while read var1 var2 ...; do # process each line, using variables as parsed into $var1, $var2, etc done... which is referred to as a "pipemill" (since the while is "milling" over the results from the initial command.)

Example of Pipemill

find / /usr /var -mount -user foo -printf "%m %p " | while read mode filename; do chown $NEWOWNER "$filename" chmod $MODE "$filename" done(This example will traverse file directory trees changing the ownership of all files while preserving all permissions, including those that are often stripped off by many versions of the chown command).

There are a number of variations of the pipemill construct including:ps lax | { read x; while read x owner pid parent x x x x x stat x; do [ "$owner"="foo" -a "$stat"="Z" ] && kill "$parent" done }(This example kills the parent processes for zombies owned/created by the user "foo").

Here the while loop is enclosed in a command group (the braces); and preceded by a read command, which effectively "throws away" the first line from the ps command. (Of course, in this particular example it would be harmless to process the header line, as it wouldn't match the "$owner"= test). Note that the other references to the "x" variable are simply being used as placeholders for "throwing away" irrelevant fields from each line.

The defining characteristics of a "pipemill" are: some command or series of commands feeds data into a pipe from which a shell while loop reads and processes it.

Creating pipelines programmatically

Pipelines can be created under program control.The pipe() system call asks the operating system to construct a new anonymous pipe object.This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end.The pipe ends appear to be normal, anonymous file descriptors, except that they have no ability to seek.

To avoid deadlock and exploit parallelism, the process with one or more new pipes will then, generally, callfork() to create newprocesses. Each process will then close the end(s) ofthe pipe that it will not be using before producing or consuming any data.Alternatively, a process might create a new thread and use the pipe to communicate between them.

"Named pipes" may also be created using mkfifo() or mknod() and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or with tee.

Implementation

In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected , and managed by the scheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of buffering: a sending program may produce 5000 bytes per second, and a receiving program may only be able to accept 100 bytes per second, but no data are lost. Instead, the output of the sending program is held in a buffer, or queue. When the receiving program is ready to read data, the operating system sends it data from the buffer, then removes that data from the buffer. If the buffer fills up, the sending program is suspended (blocked) until the receiving program has had a chance to read some data and make room in the buffer.

Network pipes

Tools like netcat and socat can connect pipes to TCP/IP sockets, following the Unix philosophy of "everything is a file".

History

The pipeline concept and the vertical-bar notation was invented by Douglas McIlroy, one of the authors of the early command shells, after he noticed that much of the time they were processing the output of one program as the input to another. His ideas were implemented in 1973 when Ken Thompson added pipes to the UNIX operating system. [http://www.linfo.org/pipe.html Pipes: A Brief Introduction by The Linux Information Project (LINFO)] The idea was eventually ported to other operating systems, such as DOS, OS/2, Microsoft Windows, and BeOS, often with the same notation.

Although developed independently, Unix pipes are similar to, and were preceded by the 'communication files' developed by Ken Lochner [http://www.cs.rit.edu/~swm/history/DTSS.doc] in the 1960's for the Dartmouth Time Sharing System. [http://cm.bell-labs.com/who/dmr/hist.html]

The robot in the icon for Apple's Automator, which also uses a pipeline concept to chain repetitive commands together, holds a pipe.

Other operating systems

This feature of Unix was borrowed by other operating systems, such as Taos and MS-DOS, and eventually became the pipes and filters design pattern of software engineering.

See also

* Tee (command) for fitting together two pipes
* Pipeline (software) for the general software engineering concept.
* Pipeline (computing) for other computer-related pipelines.
* Hartmann pipeline
* Python Pipelines
* Anonymous pipe, a FIFO structure used for interprocess communication
* Named pipe persistent pipes used for interprocess communication
* Redirection (computing)
* XML pipeline for processing of XML files

External links

*man|sh|pipe|SUS|create an interprocess channel
* [http://www.linfo.org/pipe.html Pipes: A Brief Introduction] by The Linux Information Project (LINFO)
* [http://www.softpanorama.org/Scripting/pipes.shtml] Unix Pipes -- powerful and elegant programming paradigm (Softpanorama)
* [http://en.wikibooks.org/w/index.php?title=Ad_Hoc_Data_Analysis_From_The_Unix_Command_Line "Ad Hoc Data Analysis From The Unix Command Line" at Wikibooks] shows how to use pipelines composed of simple filters to do complex data analysis.
* [http://www.pixelbeat.org/programming/stdio_buffering/ stdio buffering]
* [http://www.debian-administration.org/articles/145 Use And Abuse Of Pipes With Audio Data] gives an introduction to using and abusing pipes with netcat, nettee and fifos to play audio across a network.
* [http://www.codecrazy.net/hitting-pipe-sunday.html hitting the pipe] A program that forks two processes that communicate with each other using pipes.

References

* Sal Soghoian on MacBreak Episode 3 "Enter the Automatrix"


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Pipeline (Unix) — Tube Unix Dans les systèmes d exploitation de type Unix, un tube Unix, ou pipeline, ou pipe est un ensemble de processus chaînés par leurs flux standard, de sorte que la sortie d un processus (stdout) alimente directement l entrée (stdin) du… …   Wikipédia en Français

  • Pipeline — may refer to:* Classic RISC pipeline, a five stage hardware based computer instruction set. * Pipeline transport, a conduit made from pipes connected end to end for long distance fluid transport * Plastic pressure pipeline, for fluid handling *… …   Wikipedia

  • Pipeline (software) — In software engineering, a pipeline consists of a chain of processing elements (processes, threads, coroutines, etc .), arranged so that the output of each element is the input of the next. Usually some amount of buffering is provided between… …   Wikipedia

  • Pipeline (computing) — In computing, a pipeline is a set of data processing elements connected in series, so that the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time sliced fashion; in that case,… …   Wikipedia

  • UNIX — Генеалогическое древо UNIX систем UNIX (читается юникс) семейство переносимых, многозадачных и многопользовательских …   Википедия

  • Pipeline programming — When a programming language is originally designed without any syntax to nest function calls, pipeline programming is a simple syntax change to add it. The programmer connects notional program modules into a flow structure, by analogy to a… …   Wikipedia

  • pipeline — ● ►en /pip lin/ ou /payp layn/ n. m. ● 1. ►ARCHI Procédé qui permet de rendre indépendantes les différentes étapes de traitement d une instruction par le processeur. Ces étapes sont par exemple chez Intel: 1°) lecture en mémoire, 2°) et 3°)… …   Dictionnaire d'informatique francophone

  • XML pipeline — In computer science, an XML Pipeline is formed when XML (Extensible Markup Language) processes, sometimes called XML transformations, are connected together.For instance, given two transformations T1 and T2, the two can be connected together so… …   Wikipedia

  • Hartmann pipeline — Pipelines Paradigm(s) Dataflow programming Appeared in 1986 Designed by John P. Hartmann Developer IBM …   Wikipedia

  • Tube (Unix) — Tube Unix Dans les systèmes d exploitation de type Unix, un tube Unix, ou pipeline, ou pipe est un ensemble de processus chaînés par leurs flux standard, de sorte que la sortie d un processus (stdout) alimente directement l entrée (stdin) du… …   Wikipédia en Français