Multiple input files awk command line

Question

I am an awk newbie and admittedly don't understand how the FNR NR drives looping through files. I'm able to get two input files working. I need to add another (inputFile3).

I am running this from the command line:

awk -f parseField.awk inputFile1.csv inputFile2.csv ./inputFile3.TXT

Currently, I loop through inputFile3 using:

FNR!=NR {...}

I loop through inputFile1 using:

FNR==NR {...}

I need to add another file to the mix (inputFile2). What is the syntax that I can use in my awk script (parseField) to access that third input file?

FNR == "The input record number in the current input file." NR == "The total number of input records seen so far." so FNR==NR for the first file and is different for every other file. What are you trying to do with your third file? — Etan Reisner
– Etan Reisner, Commented Oct 18, 2015 at 20:46

Community · Accepted Answer · 2017-05-23 10:27:07Z

4

To add to @EtanReisner 's good information, you can keep a counter: FNR==1 {file_number++}. This will increase the counter whenever the first line of a file is read.

All together, you can say:

#!/bin/awk -f

BEGIN {print "start program"}
NR==1 {print "reading first file"}
FNR==1 {filenum++; print "I am in file number", filenum}
{ ... }

If you are in a ~~GNU~~ POSIX awk (thanks Jonathan Leffler) you can also use the FILENAME variable. Or also the ARGC variables and ARGV array.

Also see information about this in Idiomatic awk:

Another construct that is often used in awk is as follows:
$ awk 'NR == FNR { # some actions; next} # other condition {# other actions}' file1.txt file2.txt
This is used when processing two files. When processing more than one file, awk reads each file sequentially, one after another, in the order they are specified on the command line. The special variable NR stores the total number of input records read so far, regardless of how many files have been read. The value of NR starts at 1 and always increases until the program terminates. Another variable, FNR, stores the number of records read from the current file being processed. The value of FNR starts at 1, increases until the end of the current file is reached, then is set again to 1 as soon as the first line of the next file is read, and so on. So, the condition NR == FNR is only true while awk is reading the first file.

edited May 23, 2017 at 10:27

CommunityBot

11 silver badge

answered Oct 18, 2015 at 20:57

fedorqui

293k113 gold badges592 silver badges640 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Jonathan Leffler Over a year ago

FILENAME is part of POSIX awk. So too is the ARGV array, and ARGC variable — the indexes of ARGV start from 0 (rather than 1), and the arguments recorded exclude the options to awk and the program.

fedorqui Over a year ago

@JonathanLeffler yes, so that's why I suggest using a counter whenever FNR==1 as the most reliable way to do this.

Jonathan Leffler Over a year ago

I agree that FNR == 1 is a good way of detecting a change of file. Your comment about GNU Awk is more restrictive than need be (FILENAME is not exclusively in GNU Awk). And knowing that ARGC and ARGV exist can be helpful.

fedorqui Over a year ago

@JonathanLeffler ah, now I see your point. Thanks for it, updated!

Mark Setchell · Accepted Answer · 2015-10-19 09:06:19Z

Not as elegant as the POSIX FILENAME solution, but handy for dusty, old awks that lack too many features. You can make a compound statement that manipulates your data before sending it to awk in a couple of ways...

Option 1

First, you could output the filenumber on its own before each file that you send to awk. So, if your files look like this:

file1

Line 1 of 1

file2

Line 1 of 2
Line 2 of 2

file3

Line 1 of 3
Line 2 of 3
Line 3 of 3

You could do this:

{ echo 1; cat file1; echo 2; cat file2; echo 3; cat file3; }
1
Line 1 of 1
2
Line 1 of 2
Line 2 of 2
3
Line 1 of 3
Line 2 of 3
Line 3 of 3

and pipe that into awk and then pick up the filenumber every time the number of fields is 1

{ echo 1; cat file1; echo 2; cat file2; echo 3; cat file3; } | awk 'NF==1{file=$1;next} {print file,$0}'
1 Line 1 of 1
2 Line 1 of 2
2 Line 2 of 2
3 Line 1 of 3
3 Line 2 of 3
3 Line 3 of 3

Option 2

Or, you could edit the filenumber onto the start, or end, of every line so it is available as $1 inside awk, like this:

{ sed 's/^/1 /' file1; sed 's/^/2 /' file2; sed 's/^/3 /' file3; }
1 Line 1 of 1
2 Line 1 of 2
2 Line 2 of 2
3 Line 1 of 3
3 Line 2 of 3
3 Line 3 of 3

So, now you can do

{ sed 's/^/1 /' file1; sed 's/^/2 /' file2; sed 's/^/3 /' file3; } | awk '{file=$1; ...}'

I'm still voting for @fedorqui's solution though :-)

Collectives™ on Stack Overflow

Multiple input files awk command line

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related