2

I would like to use the following awk line to remove every even line (and keep the odd lines) in a text file.

awk 'NR%2==1' filename.txt > output

The problem is that I struggle to either loop properly in awk or build a shell script to apply this to all *.txt fies in a folder. I tried to use this one-liner

gawk 'FNR==1{if(o)close(o);o=FILENAME;
sub(/\.txt/,"_oddlines.txt",o)}{NR%2==1; print>o}'  

but that didn't remove the even lines. And I am even less familiar with shell scripting. I use gawk under win7 or cygwin with bash. Many thanks for any kind of idea.

5 Answers 5

3

Your existing gawk one-liner is really close. Here it is formatted as a more readable script:

FNR == 1 {
    if (o)
        close(o)
    o = FILENAME
    sub(/\.txt/, "_oddlines.txt", o)
}
{
    NR % 2 == 1
    print > o
}

This should make the error obvious1. So now we remove that error:

FNR == 1 {
    if (o)
        close(o)
    o = FILENAME
    sub(/\.txt/, "_oddlines.txt", o)
}
NR % 2 == 1 {
    print > o
}

$ awk -f foo.awk *.txt

and it works (and of course you can re-one-line-ize this).

(Normally I would do this with a for like the other answers, but I wanted to show you how close you were!)


1Per comment, maybe not quite so obvious?

Awk's basic language construct is the "pattern-action" statement. An awk program is just a list of such statements. The "pattern" is so named because originally they were mostly grep-like regular expression patterns:

$ awk '/^be.*st$/' < /usr/share/dict/web2
beanfeast
beast
[snip]

(Except for the slashes, this is basically just running grep, since it uses the default action, print.)

Patterns can actually contain two addresses, but it's more typical to use one, as in these cases. Patterns not enclosed within slashes allow tests like FNR == 1 (File-specific Number of this Record equals 1) or NR % 2 == 1 (Number of this Record—cumulative across all files!—mod 2 equals 1).

Once you hit the open brace, though, you're into the "action" part. Now NR % 2 == 1 simply calculates the result (true or false) and then throws it away. If you leave out the "pattern" part entirely, the "action" part is run on every input line. So this prints every line.

Note that the test NR % 2 == 1 is testing the cumulative record-number. So if some file has an odd number of lines ("records"), the next file will print out every even-numbered line (and this will persist until you hit another file with an odd number of lines).

For instance, suppose the two input files are A.txt and B.txt. Awk starts reading A.txt and has both FNR and NR set to 1 for the first line, which might be, e.g., file A, line 1. Since FNR == 1 the first "action" is done, setting o. Then awk tests the second pattern. NR is 1, so NR % 2 is 1, so the second "action" is done, printing that line to A_oddlines.txt.

Now suppose file A.txt contains only that one line. Awk now goes on to file B.txt, resetting FNR but leaving NR cumulative. The first line of B might be file B, line 1. Awk tries the first "pattern", and indeed, FNR == 1 so this closes the old o and sets up the new one.

But NR is 2, because NR is cumulative across all input files. So the second pattern (NR % 2 == 1) computes 2 % 2 (which is 0) and compares == 1 which is false, and thus awk skips the second "action" for line 1 of file B.txt. Line 2, if it exists, will have FNR == 2 and NR == 3, so that line will be copied out.

(I originally assumed, since your script was close to working, that you intended this and were just stuck a bit on syntax.)

Sign up to request clarification or add additional context in comments.

2 Comments

thanks it works and I see the error yet I do not really understand it. Would you mind to examplify it. thanks.
Hi torek, thanks for this excellent explanation. I learned something for the next awk-problem!
3

With GNU awk you could just do:

$ awk 'FNR%2{print > (FILENAME".odd")}' *.txt

This will create a .odd file for every .txt file in the current directory containing only the odd lines.


However sed has the upper hand on conciseness here. The following GNU sed command will remove all even lines and store the old file with the extension .bck for all .txt files in the current directory:

$ sed -ni.bck '1~2p' *txt

Demo:

$ ls
f1.txt  f2.txt

$ cat f1.txt
1
2
3
4
5

$ cat f2.txt
6
7
8
9
10

$ sed -ni.bck '1~2p' *txt

$ ls
f1.txt  f1.txt.bck  f2.txt  f2.txt.bck

$ cat f1.txt
1
3
5

$ cat f1.txt.bck
1
2
3
4
5

$ cat f2.txt
6
8
10

$ cat f2.txt.bck
6
7
8
9
10

If you don't won't the back up files then simply:

$ sed -ni '1~2p' *txt

6 Comments

thanks for the solution with sed. I worked it out with awk, this was just simpler because I already used awk.
The posted awk script won't select the odd lines from each file, it will select the odd lines across all files. If the first 2 files both have 3 lines, then it'll select lines 1 and 3 from file1 but line 2 from file2. You need to use FNR in the test, not NR. Also, ENDFILE is gawk-specific but if you're using gawk you don't need to close the files as you change files. Finally, you don't need the variable f since you could just do print > (FILENAME".odd") but for efficiency you could keep it but only set it when FNR==1, I'd do 'FNR==1{if(f)close(f); f=FILENAME".odd"} FNR%2{print > f}'
@EdMorton I originally posted FNR not sure what I changed it. I do state using GNU awk and gawk still limits the number of open files right? You don't need a variable but it's good practice, although makes sense to only initialise in BEGINFILE.
No, gawk doesn't limit the number of open files, it magically handles it internally. I know you said GNU awk, I'm just saying that since it's GNU awk you don't need to use ENDFILE and to close the files.
@EdMorton In that case scratch the file handling.
|
1

Personally, I'd use

for filename in *.txt; do
    awk 'NR%2==1' "$filename" > "oddlines-$filename"
done

EDIT: quote filenames

3 Comments

thanks it works but I solved the issue with an awk script already. Tried yours as well and will use it in the future.
Won't select the odd numbered lines in each file and will fail cryptically for various file names.
updated for 'various file names' issue, but Ed Morton is wrong on the first issue
1

You can try a for loop :

#!/bin/bash

for file in dir/*.txt
do    
   oddfile=$(echo "$file" | sed -e 's|\.txt|_odd\.txt|g')  #This will create file_odd.txt
   awk 'NR%2==1' "$file" > "$oddfile"  # This will output it in the same dir.
done

1 Comment

You don't need to escape . in the replacement string and -e and g are redundant sed 's/[.]txt$/_odd.txt/'
1

Your problem is that NR%2==1 is inside the {NR%2==1; print>o} 'action block' and is not kicking in as a 'condition'. Use this instead:

gawk 'FNR==1{if(o)close(o);o=FILENAME;sub(/\.txt/,"_oddlines.txt",o)};
     FNR%2==1{print > o}' *.txt

3 Comments

Thanks a lot for this answer. It does perfectly the job. But if it looks obvious to you I do not fully understand why the NR%2==1 must be outside the curly bracket. I thought awk has always a main part with a possible BEGIN and END. It works perfect but does that not leave NR$2==1 outside one of the parts? Apologies if that looks like a dull question. Thanks.
@greta, good question. Here's a basic introduction to how awk works, this should get you going with the basic concepts.
Almost, just change NR%2 to FNR%2.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.