sauravomar: AWK Predefined Variables

As we already seen basic text filtering part of awk. If you have not read please go through that article first. Here is the link this is the prerequisite for this article. Now we see predefined variables which are present in awk.

Predefined Variables in AWK:

Let's start with the predefined variable in AWK, It contains predefined variables which contain values.

RS : input record separator

FS : input field separator

ORS : output record separator

OFS : output field separator

NF : number of fields in current input line

NR : number of the current input line

FILENAME : current input file name

BEGIN and END in AWK:

We already know that awk is also language so when we write code in awk then it starts with BEGIN keyword and ends with END keyword.

AWK starts with BEGIN before reading any line from input and ENDs with after all the lines read from the input

Let see with some example:

awk ‘ 
   BEGIN 
     {printf “no of peoples whose marks is greater than 85 is “} 
     $4 >= 85 {counter+=1}
   END { printf “%s\n”, counter} ‘ 
example.txt

OUTPUT:
no of peoples whose marks is greater than 85 is 5

When awk starts reading the line from input it first executed BEGIN block similarily END block before exits.

FILENAME keyword:

print fileName using Awk.

awk ‘{print FILENAME;}’ example.txt

OUTPUT:

example.txt
example.txt
example.txt
example.txt
example.txt
example.txt

or

awk ' BEGIN {} END { print FILENAME} ' example.txt

OUTPUT:

example.txt

FILENAME is printed in the console. no of filename printed in the console is depends no of lines present in the filename because by default awk read file line by line.

RS: Input Record Separator, while parsing text default record separator is the newline(\n). Accordingly, we can update depends upon the requirement

Let see with some examples.

echo “SEQ Name Subject Marks;1) Saurav Physics 80;2) Deepak Maths 90;3) Dhoni Biology 87;4) Kedar English 85;5) Pandya History 89;” 
 | 
awk ‘BEGIN { RS=”;” ;} {print $0}’

SEQ Name Subject Marks

1) Saurav Physics 80

2) Deepak Maths 90

3) Dhoni Biology 87

4) Kedar English 85

5) Pandya History 89

As we can see echo “SEQ…blah blah” is separated by “(;)semicolon” awk reading line by line which separated by “;”.

FS is Field Separator by default it is set as the tab in case if we want to update or in case if columns are separated by different delimiters like ‘,(comma in case of csv)’ “:” etc then it is very helpful. we can also use ‘-F’ flag as well to achieve this.

Let see with some examples:

cat examples_new.txt

SEQ:Name:Subject:Marks
1):Saurav:Physics:80
2):Deepak:Maths:90
3):Dhoni:Biology:87
4):Kedar:English:85
5):Pandya:History:89

awk 'BEGIN {FS=":"} {print $1,$2,$3,$4 }' example_new.txt

SEQ Name Subject Marks
1) Saurav Physics 80
2) Deepak Maths 90
3) Dhoni Biology 87
4) Kedar English 85
5) Pandya History 89

or

awk -F':' '{print $1,$2,$3,$4 }' example_new.txt
SEQ Name Subject Marks
1) Saurav Physics 80
2) Deepak Maths 90
3) Dhoni Biology 87
4) Kedar English 85
5) Pandya History 89

As we can we see with example input is separated(delimiter) by the colon(“:”) awk reading depends upon the delimiter.

OFS: Output Field Separator, while parsing text default output separator is tab but in case if we want to update then we change to any delimiter. Let see with some examples:

awk ‘BEGIN {OFS=”:”} {print $1,$2,$3 ,$4 }’ example.txt

OUTPUT:

SEQ:Name:Subject:Marks
1):Saurav:Physics:80
2):Deepak:Maths:90
3):Dhoni:Biology:87
4):Kedar:English:85
5):Pandya:History:89

If we want to convert it to CSV:

awk ‘BEGIN {OFS=”,”} {print $1,$2,$3 ,$4 }’ example.txt

SEQ,Name,Subject,Marks
1),Saurav,Physics,80
2),Deepak,Maths,90
3),Dhoni,Biology,87
4),Kedar,English,85
5),Pandya,History,89

Similarly, we can update OFS as we want.

ORS: Output Record Separator, while parsing text default output record separator is a newline(“\n”) but in case if we want to update then we change to any delimiter. Let see with some examples:

awk ‘BEGIN {ORS=”:”} {print $1,$2,$3 ,$4 }’ example.txt

OUTPUT:

SEQ Name Subject Marks:1) Saurav Physics 80:2) Deepak Maths 90:3) Dhoni Biology 87:4) Kedar English 85:5) Pandya History 89

NF: Number of fields present in the current line while reading input by awk it keeps tracks of the number of fields present in the current row depends upon the delimiters.
NR: Number of the current input line, while reading input by awk it keeps tracks of the number of lines currently it reading depends upon the delimiters.

Let see with some of the examples:

awk ‘{print “CURRENT LINE: “ NF “\t TOTAL FIELDS in present line:” NR}’ example.txt

Output:

CURRENT LINE: 4 TOTAL FIELDS in present line:1
CURRENT LINE: 4 TOTAL FIELDS in present line:2
CURRENT LINE: 4 TOTAL FIELDS in present line:3
CURRENT LINE: 4 TOTAL FIELDS in present line:4
CURRENT LINE: 4 TOTAL FIELDS in present line:5
CURRENT LINE: 4 TOTAL FIELDS in present line:6

Here is the end of the tutorial will see coding constructs like if and else, loops etc in the next article.

Feedbacks are always welcome

Happy Coding :)

sauravomar

Saturday, 23 February 2019

AWK Predefined Variables

Predefined Variables in AWK:

BEGIN and END in AWK:

No comments:

Post a Comment

Generating Unique Id in Distributed Environment in high Scale:

Search This Blog