Saturday, 23 February 2019

AWK Predefined Variables

As we already seen basic text filtering part of awk. If you have not read please go through that article first. Here is the link this is the prerequisite for this article. Now we see predefined variables which are present in awk.

Predefined Variables in AWK:

Let's start with the predefined variable in AWK, It contains predefined variables which contain values.
RS : input record separator
FS : input field separator
ORS : output record separator
OFS : output field separator
NF : number of fields in current input line
NR : number of the current input line
FILENAME : current input file name

BEGIN and END in AWK:

We already know that awk is also language so when we write code in awk then it starts with BEGIN keyword and ends with END keyword.
AWK starts with BEGIN before reading any line from input and ENDs with after all the lines read from the input
Let see with some example:
awk ‘ 
   BEGIN 
     {printf “no of peoples whose marks is greater than 85 is “} 
     $4 >= 85 {counter+=1}
   END { printf “%s\n”, counter} ‘ 
example.txt
OUTPUT:
no of peoples whose marks is greater than 85 is 5
When awk starts reading the line from input it first executed BEGIN block similarily END block before exits.
  • FILENAME keyword:
print fileName using Awk.
awk ‘{print FILENAME;}’ example.txt
OUTPUT:
example.txt
example.txt
example.txt
example.txt
example.txt
example.txt
or 
awk ' BEGIN {} END { print FILENAME} ' example.txt
OUTPUT:
example.txt
FILENAME is printed in the console. no of filename printed in the console is depends no of lines present in the filename because by default awk read file line by line.
  • RS: Input Record Separator, while parsing text default record separator is the newline(\n). Accordingly, we can update depends upon the requirement
Let see with some examples.
echo “SEQ Name Subject Marks;1) Saurav Physics 80;2) Deepak Maths 90;3) Dhoni Biology 87;4) Kedar English 85;5) Pandya History 89;” 
 | 
awk ‘BEGIN { RS=”;” ;} {print $0}’
SEQ Name Subject Marks
1) Saurav Physics 80
2) Deepak Maths 90
3) Dhoni Biology 87
4) Kedar English 85
5) Pandya History 89
As we can see echo “SEQ…blah blah” is separated by “(;)semicolon” awk reading line by line which separated by “;”.
  • FS is Field Separator by default it is set as the tab in case if we want to update or in case if columns are separated by different delimiters like ‘,(comma in case of csv)’ “:” etc then it is very helpful. we can also use ‘-F’ flag as well to achieve this.
Let see with some examples:
cat examples_new.txt
SEQ:Name:Subject:Marks
1):Saurav:Physics:80
2):Deepak:Maths:90
3):Dhoni:Biology:87
4):Kedar:English:85
5):Pandya:History:89
awk 'BEGIN {FS=":"} {print $1,$2,$3,$4 }' example_new.txt
SEQ Name Subject Marks
1) Saurav Physics 80
2) Deepak Maths 90
3) Dhoni Biology 87
4) Kedar English 85
5) Pandya History 89
or
awk -F':' '{print $1,$2,$3,$4 }' example_new.txt
SEQ Name Subject Marks
1) Saurav Physics 80
2) Deepak Maths 90
3) Dhoni Biology 87
4) Kedar English 85
5) Pandya History 89
As we can we see with example input is separated(delimiter) by the colon(“:”) awk reading depends upon the delimiter.
  • OFSOutput Field Separator, while parsing text default output separator is tab but in case if we want to update then we change to any delimiter. Let see with some examples:
awk ‘BEGIN {OFS=”:”} {print $1,$2,$3 ,$4 }’ example.txt
OUTPUT:
SEQ:Name:Subject:Marks
1):Saurav:Physics:80
2):Deepak:Maths:90
3):Dhoni:Biology:87
4):Kedar:English:85
5):Pandya:History:89
If we want to convert it to CSV:
awk ‘BEGIN {OFS=”,”} {print $1,$2,$3 ,$4 }’ example.txt
SEQ,Name,Subject,Marks
1),Saurav,Physics,80
2),Deepak,Maths,90
3),Dhoni,Biology,87
4),Kedar,English,85
5),Pandya,History,89
Similarly, we can update OFS as we want.
  • ORSOutput Record Separator, while parsing text default output record separator is a newline(“\n”) but in case if we want to update then we change to any delimiter. Let see with some examples:
awk ‘BEGIN {ORS=”:”} {print $1,$2,$3 ,$4 }’ example.txt
OUTPUT:
SEQ Name Subject Marks:1) Saurav Physics 80:2) Deepak Maths 90:3) Dhoni Biology 87:4) Kedar English 85:5) Pandya History 89
  • NF: Number of fields present in the current line while reading input by awk it keeps tracks of the number of fields present in the current row depends upon the delimiters.
  • NR: Number of the current input line, while reading input by awk it keeps tracks of the number of lines currently it reading depends upon the delimiters.
Let see with some of the examples:
awk ‘{print “CURRENT LINE: “ NF “\t TOTAL FIELDS in present line:” NR}’ example.txt
Output:
CURRENT LINE: 4 TOTAL FIELDS in present line:1
CURRENT LINE: 4 TOTAL FIELDS in present line:2
CURRENT LINE: 4 TOTAL FIELDS in present line:3
CURRENT LINE: 4 TOTAL FIELDS in present line:4
CURRENT LINE: 4 TOTAL FIELDS in present line:5
CURRENT LINE: 4 TOTAL FIELDS in present line:6
Here is the end of the tutorial will see coding constructs like if and else, loops etc in the next article.
Feedbacks are always welcome
Happy Coding :)

Awk Tutorial for Beginners

What is AWK?

  • AWK, one of the most prominent text-processing or text filtering utility on GNU/Linux. Very and powerful programming language, solve complex problems in very less line of codes.
  • Its name is derived from the family names of its authors − Alfred Aho, Peter Weinberger, and Brian Kernighan.
  • Maintained by FSF (Free Software Foundation).
  • Basic Syntax of awk is awk ‘options’ file.

Print file using awk?

Its similar to cat /etc/resolve.conf. It prints file content in the console.
awk ‘//{print}’ /etc/resolv.conf
       or 
awk ‘{print}’ /etc/resolv.conf
difference between the above two examples is in the first example it will print or if you want to print a specific line which contains patterns, whereas in the second example it's just print the content in the console, for example,
awk ‘/8.8.8.8/{print}’ /etc/resolv.conf
it will print line which contains “8.8.8.8”. the basic syntax of the first example is awk ‘/pattern/print’ file.
pattern: can be regex or string.
awk ‘/^saurav/{print}’ /etc/passwd.
in the above example line which starts with saurav will print.
awk ‘/*sql$/{print}’ /etc/passwd
in the above example, the line ends with sql will print, likewise. we can use regex to print matching pattern.

Print Column using awk?

By default IFS (Intermediate field separator) in bash is space. similarily in AWK default, IFS is tab or space.
Here is the file which contains 3 columns which I gonna used to explain:
SEQ Name Subject Marks
1) Saurav Physics 80
2) Deepak Maths 90
3) Dhoni Biology 87
4) Kedar English 85
5) Pandya History 89
Printing 3rd column: Here we are going to print 3 rd column
awk ‘//{print $3}’ example.txt
Output:
Subject
Physics
Maths
Biology
English
History
Let see how to print column 2 and 4
awk ‘//{print $2 $4}’ example.txt
Output:
NameMarks
Saurav80
Deepak90
Dhoni87
Kedar85
Pandya89
here we can see awk is printing column which is not separated. if you want to separate columns use ‘,’ (comma).
awk ‘//{print $2, $4;}’ example.txt
Output:
Name Marks
Saurav 80
Deepak 90
Dhoni 87
Kedar 85
Pandya 89

Using printf in awk?

Printf helps here to format the output to print.
For Example:
awk ‘NR>1 {printf “Marks=%d Subject=%s\n”,$4, $3 }’ example.txt
Output:
Marks=80 Subject=Physics
Marks=90 Subject=Maths
Marks=87 Subject=Biology
Marks=85 Subject=English
Marks=89 Subject=History
As you can see in the above example printf function similar in C language works here.

Comparison Operators in AWK:

In awk, you can compare columns and print in the console
For Example:
awk ‘$4 > 85 {print;}’ example.txt
SEQ Name Subject Marks
2) Deepak Maths 90
3) Dhoni Biology 87
5) Pandya History 89
in the above example print the line whose 4 th column (marks) is greater than 85.
So there are different comparison operators
  1. >:greater than
  2. <:less than
  3. >=:greater than or equal to
  4. <=: less than or equal to
  5. ==:equal to
  6. !=: not equal to
  7. some_value ~ / pattern/: – true if some_value matches the pattern
  8. some_value !~ / pattern/: – true if some_value does not match the pattern.
If we want to print the marks of Deepak:
awk ‘$2 ~ “Deepak” { print $0 ; }’ example.txt
Output:
2) Deepak Maths 90
similarily we can get the matching row using comparison operators.

Compound operation in AWK:

In awk, we can combine multiple expression to filter text. We can use && (and) and || (or) operators to achieve this.
Let see some examples.
Print marks of the people who have marks greater than 85 in History.
awk ‘($4 >= 85 ) && ($3 ~ “History”) { print $0 ; }’ example.txt
OUTPUT:
5) Pandya History 89
Print marks of the people who have marks greater than 85 or whose subject is History.
awk '($4 >= 85 ) || ($3 ~ "History") { print  $0 ; }' example.txt
OUTPUT:
2)  Deepak    Maths      90
3)  Dhoni    Biology    87
4)  Kedar    English    85
5)  Pandya    History    89
similarily we can achieve combining multiple expression to filter the text.

Next Keyword in AWK:

next keyword is somewhat similar as continue in a different programming language like java, scala. This really helps when there are the multiple expression to evaluate and the only one you want to print skip rest all the expressions.
For Example:
awk ‘ FNR == 1 {next};
      $4 >= 85 { printf “%s\t%s\n”, $0,”EXEMPTION” ; next} 
      $4 < 85 {printf “%s\t%s\n”, $0,”PASSED”;} ‘ 
 example.txt
Output:
1) Saurav Physics 80 PASSED
2) Deepak Maths 90 EXEMPTION
3) Dhoni Biology 87 EXEMPTION
4) Kedar English 85 EXEMPTION
5) Pandya History 89 EXEMPTION
In the above example as we can see
first line FNR == 1 {next} check if its first line or row then go to next.
second line $4 >= 85 { printf “%s\t%s\n”, $0,”EXEMPTION” ; next} itcheck if the 4th column(marks) is greater than 85 then print and go to the next line .

Variables and Numeric Expressions:

Variables are place holders which store some value which stored in memory like other programming languages.
Syntax:
variable=value
Example:
marks=10
name=saurav
Numeric expressions are the expression which does numeric expressions. Like adding or dividing some numbers similar to other programming languages.
Syntax: operand operator operand
Example:
var1=1
var2=2
var3= var1 + var2
Let see some examples:
Print line number with every line in the console.
awk ‘FNR==1 {next};
line= $0 //store content reads by awk
{ line_no=+1 ; printf “%d\t%s\n”, line_no,line ; }’ //  increment line_no with every line read
example.txt
OUTPUT:
1 1) Saurav Physics 80
2 2) Deepak Maths 90
3 3) Dhoni Biology 87
4 4) Kedar English 85
5 5) Pandya History 89
Happy Coding :).

Tuesday, 12 February 2019

Basic tutorial of SED (Stream Editior) for beginners

What is SED?

Sed is stream editor and ultimate editor (non-interactive text editor)for modifying files automatically. Commonly used in the Linux/Unix based system. Sed inputs in the form of a stream and update the stream or input depends on the instructions.
Many System developers or admins use this commands on daily basis to update or replace text or filter from the strings or files.

How to use?

I will use the given file reference to explain the commands:
for seq in `seq 1 5`; do echo “CAT_$seq” >> exp.txt; done
the above command will create file “exp.txt” which has content CAT_1 . to CAT_5 separated by lines.

Delimiter IN SED:

Most of the people know that only ‘/’ slash is a delimiter this is a myth you can use like “|”, “,”, “_”, “:” etc.
Example:
echo "CAT"| sed 's:CAT:DOG:'
echo "CAT"| sed 's|CAT|DOG|'
echo "CAT"| sed 's_CAT_DOG_'
echo "CAT"| sed 's;CAT;DOG;'
echo "CAT"| sed 's,CAT,DOG,'
so all above command yields the same result.

How to print line no using sed:

Using “=” we can print line and line no:
Example:
sed ‘=’ exp.txt
OUTPUT:
1
CAT_1
2
CAT_2
3
CAT_3
4
CAT_4
5
CAT_5

Print file using SED:

Example: Print from line no1 to 5.
sed '1,3p' exp.txt
OUTPUT:
CAT_1
CAT_1
CAT_2
CAT_2
CAT_3
CAT_3
CAT_4
CAT_5
By default, each line of input is printed to the standard output, after all of the commands have been applied to it to suppress this behavior we have -n
sed -n '1,5p' exp.txt
OUTPUT:
CAT_1
CAT_2
CAT_3

Print Non-consecutive lines:

How to print non-consecutive lines like print from line 1to3 and 5.
Example:
sed -n -e '1,3p' -e '5p' exp.txt
Output:
CAT_1
CAT_2
CAT_3
CAT_5
Here we have used -e flag basically means append the editing commands specified by the command argument to the list of commands.
it’s similar to execute multiple sed commands same as below.
sed -n '1,3p'  exp.txt ; sed -n 5p exp.txt

Delete Lines and Print:

How to delete or remove some of the lines and print rest all the lines.
Example:
sed '3d'  exp.txt
so above command delete 3rd line and print all the lines.
Output:
CAT_1
CAT_2
CAT_4
CAT_5

Inserting spaces in files:

Using “G” we can insert an empty line with every non-empty line present in the file.
Example:
sed ‘G’ exp.txt
Output:
CAT_1
CAT_2
CAT_3
CAT_4
CAT_5
you can also do sed ‘G; G’ exp.txt to insert 2 blank lines, similarly, no of G’s separated by semicolon insert blank line same as no of “G”

In Place Editing in Sed:

Using the “-i” flag we can edit the file in place and changes are updated in the same file without printing output of file in the console.
Example:
sed -in 's/CAT/DOG/' exp.txt
Output:
DOG_1
DOG_2
DOG_3
DOG_4
DOG_5

Occurrences of pattern in SED:

Without giving any occurrence first matched character is replaced on giving “g” flag all occurrences are replaced. In case if you want to modify a particular pattern in sed then you can do like below.
Example:
echo "CAT CAT CAT CAT CAT"| sed 's/CAT/DOG/2'
OUTPUT: CAT DOG CAT CAT CAT
as you can see in the above example the second occurrence is replaced.
if you want to replace from second onwards you can do like this
echo "CAT CAT CAT CAT CAT"| sed 's/CAT/DOG/2g'
OUTPUT: CAT DOG DOG DOG DOG

Command S for substitution:

it will replace the occurrence of pattern to a newly given pattern

Replace String using String:

Example: Let's replace CAT_2 to DOG_2
sed ‘s/CAT_2/DOG_2/’ exp.txt
          or
cat exp.txt | sed 's/CAT_2/DOG_2/'
OUTPUT:
CAT_1
DOG_2
CAT_3
CAT_4
CAT_5
It will replace CAT_2to DOG_2
Note: Most of the Linux utilities works on reading the file line by line similarily sed works, in the same way, it will replace the first occurrence of pattern and go to next line if you want to replace all the occurrences then use “g” means global.
Example:
sed ‘s/cat_2/DOG_2/g’ exp.txt
or
cat exp.txt | sed 's/CAT_2/DOG_2/g'
We can also uses a number instead of “g” which will tell every number th position character is replaced
Example:
echo “my name is name and name” | sed ‘s/name/saurav/2
Output: my name is saurav and name
second position name is replaced with saurav

Replace String using REGEX:

Example Replace CAT from DOG
sed ‘s/^CAT*/DOG/’ exp.txt
OUTPUT:
DOG_1
DOG_2
DOG_3
DOG_4
DOG_5
Sometimes we used -E flag while regex matching in sed for example
sed -E ‘s/^CAT*/DOG/’ exp.txt
This is an extended regular expression flag, this means the behavior of a few characters: ‘?’, ‘+’, ‘()’,’{}’ etc does not require to escape while in regular (or not using -E flag) we need to escape. Extended regular expressions have more power than normal
Example:
, but sed scripts that treated “+”
echo “123 abc” | sed ‘s/[0–9]+//’
Output: 123 abc
echo “123 abc” | sed -E‘s/[0–9]+//’
Output: abc
so in above example as you can “+ ” is special character when use “-E” sed take as regular expression where as without “-E” sed take as normal string.
That's it after going through this article you can get an idea of how sed works and different flags present in flags. Some of the flags are not covered like”r” (for reading from the file), “w” for writing in file etc. these are basic and easy sed flags.
In case of any doubts or concerns please comment below.
Happy Coding . :)

Generating Unique Id in Distributed Environment in high Scale:

Recently I was working on a project which requires unique id in a distributed environment which we used as a  primary  key to store in dat...