Guide to Unix/Explanations/awk

The name 'awk' is derived from the names of the three people who originally developed it - Aho, Weinberger and Kernighan. It is a programming language which uses a pattern-action expression that transforms the input to the output. It processes the input (usually a file of data), searching each line for the given pattern. Any line that matches the given pattern has the action applied to it and this constitutes the output. A line that does not match is ignored.

Each input line is divided into fields by a separator character (default is space) and patterns can be matched to these fields as they are referenced in the usual Unix style - $1 being field 1, $2 being field 2 etc. $0 means the entire input line.

If no pattern is specified then all input lines are selected. If no action is specified, the default action is to print the entire line. Therefore if you just want to print a subset of the input, you just need to supply a pattern that will produce the desired results, Awk will print the input as found.

However, you can also specify which fields are to be output in the same way e.g. print $1.

A simple example:

awk '$1 ~ /A/ { print $2 " " $3 }' /etc/passwd

Program Structure
awk programs consist of a sequence of one or more pattern-action statements: pattern  { action } pattern  { action } : :

awk scans input lines of data and performs actions on those lines that match any of the specified patterns.

Running AWK
Here we call awk from a shell script awk1.sh:


 * 1) !/bin/bash

awk ' { print } ' $1
 * 1) awk1.sh

There is no pattern, so every line fed into awk is matched and the action is invoked. Which results in every line of the file being printed on the screen. Thus awk1.sh behaves similar to cat.

To demonstrate, create the file numeric.dat with the contents:

1 one  i 2 two   ii 3 three iii 4 four iv 5 five  v 6 six   vi 7 seven vii 8 eight viii 9 nine ix 10 ten  x

Run awk1.sh on numeric.dat (don't forget to make the script executable):

./awk1.sh numeric.dat 1 one  i 2 two   ii 3 three iii 4 four iv 5 five  v 6 six   vi 7 seven vii 8 eight viii 9 nine ix 10 ten  x

(Notice how   ./    is being used to execute a script.)

Expressions
If the first field is equal to one then print the entire line awk ' $1 == 1 { print $0 } ' $1
 * 1) !/bin/sh
 * 2) awk1.sh

Results in: 1 one i

If the second field is equal to "two" then print the entire line: $2 == "two" { print $0 }

Results in: 2 two ii If the first field is greater than 5 then print the third field $1 > 5 { print $3 } Results in vi vii viii ix x

Regular Expressions
Print the input line if the pattern "ix" is matched in any field /ix/ { print $0 }

Results in:

6 six  vi 9 nine  ix

Print the input line if the pattern "ix" is matched in the third field:

$3 ~ /ix/ { print $0 } Results in: 9 nine ix Print the input lines that do not contain the pattern "x" $0 !~ /x/ { print } Results in: 1 one  i 2 two   ii 3 three iii 4 four iv 5 five  v 7 seven vii 8 eight viii

Compound expressions
Print lines where the third field matches the pattern "x" OR the first field is less than or equal to 3. $3 ~ /x/ || $1 <= 3  { print $0 } Results in: 1 one  i 2 two   ii 3 three iii 9 nine ix 10 ten  x

Print lines where the third field matches the pattern "vi" AND the second field begins with the letter "s". $3 ~ /vi/ && $2 ~ "^s" { print $0 } Results in: 6 six  vi 7 seven vii

Ranges
Print lines where the second field equals "three" and where the third field equals "vii" and all subsequent lines in between: $2 == "three", $3 == "vii" { print $0 } Results in: 3 three iii 4 four iv 5 five  v 6 six   vi 7 seven vii

BEGIN and END
BEGIN is a special pattern which matches before the first input line. Similarly END matches after the last input line. BEGIN { print "start at 3..." } $2 == "three", $2 ~ /^e/ { print $1 } END { print "...and end at eight" }

Results in start at 3... 3 4 5 6 7 8 ...and end at eight