Apply an gawk script to multiple files in a folder

Question

I would like to use the following awk line to remove every even line (and keep the odd lines) in a text file.

awk 'NR%2==1' filename.txt > output

The problem is that I struggle to either loop properly in awk or build a shell script to apply this to all *.txt fies in a folder. I tried to use this one-liner

gawk 'FNR==1{if(o)close(o);o=FILENAME;
sub(/\.txt/,"_oddlines.txt",o)}{NR%2==1; print>o}'

but that didn't remove the even lines. And I am even less familiar with shell scripting. I use gawk under win7 or cygwin with bash. Many thanks for any kind of idea.

torek · Accepted Answer · 2013-09-22 00:49:45Z

Your existing gawk one-liner is really close. Here it is formatted as a more readable script:

FNR == 1 {
    if (o)
        close(o)
    o = FILENAME
    sub(/\.txt/, "_oddlines.txt", o)
}
{
    NR % 2 == 1
    print > o
}

This should make the error obvious¹. So now we remove that error:

FNR == 1 {
    if (o)
        close(o)
    o = FILENAME
    sub(/\.txt/, "_oddlines.txt", o)
}
NR % 2 == 1 {
    print > o
}

$ awk -f foo.awk *.txt

and it works (and of course you can re-one-line-ize this).

(Normally I would do this with a for like the other answers, but I wanted to show you how close you were!)

¹Per comment, maybe not quite so obvious?

Awk's basic language construct is the "pattern-action" statement. An awk program is just a list of such statements. The "pattern" is so named because originally they were mostly grep-like regular expression patterns:

$ awk '/^be.*st$/' < /usr/share/dict/web2
beanfeast
beast
[snip]

(Except for the slashes, this is basically just running grep, since it uses the default action, print.)

Patterns can actually contain two addresses, but it's more typical to use one, as in these cases. Patterns not enclosed within slashes allow tests like FNR == 1 (File-specific Number of this Record equals 1) or NR % 2 == 1 (Number of this Record—cumulative across all files!—mod 2 equals 1).

Once you hit the open brace, though, you're into the "action" part. Now NR % 2 == 1 simply calculates the result (true or false) and then throws it away. If you leave out the "pattern" part entirely, the "action" part is run on every input line. So this prints every line.

Note that the test NR % 2 == 1 is testing the cumulative record-number. So if some file has an odd number of lines ("records"), the next file will print out every even-numbered line (and this will persist until you hit another file with an odd number of lines).

For instance, suppose the two input files are A.txt and B.txt. Awk starts reading A.txt and has both FNR and NR set to 1 for the first line, which might be, e.g., file A, line 1. Since FNR == 1 the first "action" is done, setting o. Then awk tests the second pattern. NR is 1, so NR % 2 is 1, so the second "action" is done, printing that line to A_oddlines.txt.

Now suppose file A.txt contains only that one line. Awk now goes on to file B.txt, resetting FNR but leaving NR cumulative. The first line of B might be file B, line 1. Awk tries the first "pattern", and indeed, FNR == 1 so this closes the old o and sets up the new one.

But NR is 2, because NR is cumulative across all input files. So the second pattern (NR % 2 == 1) computes 2 % 2 (which is 0) and compares == 1 which is false, and thus awk skips the second "action" for line 1 of file B.txt. Line 2, if it exists, will have FNR == 2 and NR == 3, so that line will be copied out.

(I originally assumed, since your script was close to working, that you intended this and were just stuck a bit on syntax.)

thanks it works and I see the error yet I do not really understand it. Would you mind to examplify it. thanks.
Hi torek, thanks for this excellent explanation. I learned something for the next awk-problem!

Chris Seymour · Accepted Answer · 2013-09-22 14:28:58Z

3

With GNU awk you could just do:

$ awk 'FNR%2{print > (FILENAME".odd")}' *.txt

This will create a .odd file for every .txt file in the current directory containing only the odd lines.

However sed has the upper hand on conciseness here. The following GNU sed command will remove all even lines and store the old file with the extension .bck for all .txt files in the current directory:

$ sed -ni.bck '1~2p' *txt

Demo:

$ ls
f1.txt  f2.txt

$ cat f1.txt
1
2
3
4
5

$ cat f2.txt
6
7
8
9
10

$ sed -ni.bck '1~2p' *txt

$ ls
f1.txt  f1.txt.bck  f2.txt  f2.txt.bck

$ cat f1.txt
1
3
5

$ cat f1.txt.bck
1
2
3
4
5

$ cat f2.txt
6
8
10

$ cat f2.txt.bck
6
7
8
9
10

If you don't won't the back up files then simply:

$ sed -ni '1~2p' *txt

edited Sep 22, 2013 at 14:28

answered Sep 22, 2013 at 0:22

Chris Seymour

86.4k32 gold badges166 silver badges209 bronze badges

6 Comments

greta Over a year ago

thanks for the solution with sed. I worked it out with awk, this was just simpler because I already used awk.

Ed Morton Over a year ago

The posted awk script won't select the odd lines from each file, it will select the odd lines across all files. If the first 2 files both have 3 lines, then it'll select lines 1 and 3 from file1 but line 2 from file2. You need to use FNR in the test, not NR. Also, ENDFILE is gawk-specific but if you're using gawk you don't need to close the files as you change files. Finally, you don't need the variable f since you could just do print > (FILENAME".odd") but for efficiency you could keep it but only set it when FNR==1, I'd do 'FNR==1{if(f)close(f); f=FILENAME".odd"} FNR%2{print > f}'

Chris Seymour Over a year ago

@EdMorton I originally posted FNR not sure what I changed it. I do state using GNU awk and gawk still limits the number of open files right? You don't need a variable but it's good practice, although makes sense to only initialise in BEGINFILE.

Ed Morton Over a year ago

No, gawk doesn't limit the number of open files, it magically handles it internally. I know you said GNU awk, I'm just saying that since it's GNU awk you don't need to use ENDFILE and to close the files.

Chris Seymour Over a year ago

@EdMorton In that case scratch the file handling.

|

Gedge · Accepted Answer · 2017-02-09 10:04:40Z

1

Personally, I'd use

for filename in *.txt; do
    awk 'NR%2==1' "$filename" > "oddlines-$filename"
done

EDIT: quote filenames

edited Feb 9, 2017 at 10:04

answered Sep 21, 2013 at 23:46

Gedge

3252 silver badges9 bronze badges

3 Comments

greta Over a year ago

thanks it works but I solved the issue with an awk script already. Tried yours as well and will use it in the future.

Ed Morton Over a year ago

Won't select the odd numbered lines in each file and will fail cryptically for various file names.

Gedge Over a year ago

updated for 'various file names' issue, but Ed Morton is wrong on the first issue

iamauser · Accepted Answer · 2013-09-21 23:54:25Z

1

You can try a for loop :

#!/bin/bash

for file in dir/*.txt
do    
   oddfile=$(echo "$file" | sed -e 's|\.txt|_odd\.txt|g')  #This will create file_odd.txt
   awk 'NR%2==1' "$file" > "$oddfile"  # This will output it in the same dir.
done

edited Sep 21, 2013 at 23:54

answered Sep 21, 2013 at 23:46

iamauser

11.6k6 gold badges40 silver badges57 bronze badges

1 Comment

Chris Seymour Over a year ago

You don't need to escape . in the replacement string and -e and g are redundant sed 's/[.]txt$/_odd.txt/'

iruvar · Accepted Answer · 2013-09-22 16:19:42Z

1

Your problem is that NR%2==1 is inside the {NR%2==1; print>o} 'action block' and is not kicking in as a 'condition'. Use this instead:

gawk 'FNR==1{if(o)close(o);o=FILENAME;sub(/\.txt/,"_oddlines.txt",o)};
     FNR%2==1{print > o}' *.txt

edited Sep 22, 2013 at 16:19

answered Sep 21, 2013 at 23:47

iruvar

23.5k7 gold badges58 silver badges83 bronze badges

3 Comments

greta Over a year ago

Thanks a lot for this answer. It does perfectly the job. But if it looks obvious to you I do not fully understand why the NR%2==1 must be outside the curly bracket. I thought awk has always a main part with a possible BEGIN and END. It works perfect but does that not leave NR$2==1 outside one of the parts? Apologies if that looks like a dull question. Thanks.

iruvar Over a year ago

@greta, good question. Here's a basic introduction to how awk works, this should get you going with the basic concepts.

Ed Morton Over a year ago

Almost, just change NR%2 to FNR%2.

Collectives™ on Stack Overflow

Apply an gawk script to multiple files in a folder

5 Answers 5

2 Comments

6 Comments

3 Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

6 Comments

3 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related