First steps in awk
We will be learning how to use awk in this book totally via examples. We typically run awk programs on data files or by piping it data from some other unix utility like cat/grep/sed etc.
For this book, we will be using various methods, for now, we'll be using a file called sales.csv
.
Create a folder for all the data files/programs that you are going to use in this book. In that folder, make a file named sales.csv
.
The content of the file should be as below:
Program #1
Output: Prints the entire file
This is one of the shortest awk programs one can write. What it does is basically mimics the cat
command by printing the entire file to the commandline.
How did it work?
Syntax
<command>
: should be enclosed in a'
. With code enclosed in one or more curly brackets{}
.<file1 ... n>
: One or more files on which theawk
program will be executed upon. At least one file is mandatory, otherwise awk will take input from stdin.
Execution of an awk program
When any awk
command is executed, it executes the code (enclosed in the '
) over each line of the input file(s).
Working with fields
We saw how to print the entire line to the commandline. In this section, we'll see how to selectively print fields. In awk
, a file consists of lines and every line contains fields which are separated using delimiters.
Let's say that we have this file:
Record
month,sales
is the first record. Jan,100 is the second record. i.e. records are separated with a newlineField month and sales are called fields. Whatever separates the two fields is the delimiter
Program #2
The output of the above command is exactly same as Program #1.
Program #3
$0: the entire line that's being processed. (Since awk processes an entire file on line by line basis from top to bottom)
$1: the first field of the line being processed
$2: the second field of the line being processed
$NF: the last field of the line being processed
-F','
: The delimiter in this case is a comma. You can have any delimiter which your data file contains, just enclose it in quotes.
Note:
-F ','
is wrong because there is a space between the two fields. We need to have it as -F','
The need of a delimiter?
Run the above code and see the output. After analysing it, read the next paragraph.
The default delimiter in awk is a space, thus, if we are processing a comma separated file and we don't give a delimiter, then awk will consider the entire line as the first field.
In Program #3, when we printed $1, $2
you would've noticed that the month and the sales got printed with a space between them. This is because we have a delimiter for the output fields as well.
We set the output field separator/delimiter by adding it like this before the awk code block (which is enclosed in single quotes) -v OFS=","
Now run the below code and play with the OFS value.
The printing order does not matter, we can print any field in any location for any number of times that we want.
You can run an awk script on multiple files as well, just give the names after the command.
Multiple code blocks
The statements enclosed in {}
form a block of code. These blocks are executed one at a time from left to right. When we run the above code, we'll first get the fields printed in reverse order and then the entire file as is, on the command line.
You can have as many blocks as you want.
The above blocks of code are redundant, we can have both the print statements in one block.
Within a block, a semi-colon can be used to separate the lines of code.
Links
Last updated