2

I have 1000 text files and each file is tab delimited with following format

John    32     NY     12     USA
Peter   78.    CA.    8.     USA
Stef.   67.    CA.    12.    USA

I want to extract all those line where fourth column is exactly 12. This is what I've done:


file='random'

FILES=/home/user/data/*.txt
for f in $FILES; 
do 
echo $f
filename=$(basename $f)
awk -F"\t" '$4 == 12' $f >  /home/user/extra/$file/$filename; 
done

But this produces empty files and I am not sure what I am doing wrong here. Insights will be appreciated.

3
  • 1
    You don't want to match the 12. line, right? Use string comparision: $4 == "12". Also run your script through shellcheck.net and implement its suggestions. Commented Sep 6, 2021 at 21:26
  • Oh, and for f in /home/user/data/*.txt;. Commented Sep 6, 2021 at 21:27
  • Yeah, the glob * does not expand in the assignment. Commented Sep 6, 2021 at 21:30

1 Answer 1

1

Please read Correct Bash and shell script variable capitalization and https://mywiki.wooledge.org/Quotes to understand some of the issues in your script and copy/paste any shell script you write into https://www.shellcheck.net/ until you get the fundamentals down.

Regarding But this produces empty files - sure, for any give command cmd with

for f in *; do
    cmd "$f" > "out$f"
done

you're creating an output file for each input file in the shell loop so if any input file doesn't match $4==12 in your awk script (the cmd in this case) you'll still get an output file, it'll just be empty. If you don't want that you could do:

tmp=$(mktemp)
for f in *; do
    cmd "$f" > "$tmp" &&
    mv -- "$tmp" "out$f"
done

and write cmd to exit with a succ/fail status like grep does when it finds a match (trivial in awk), or you could check the size of "$tmp" before the mv:

tmp=$(mktemp)
for f in *; do
    cmd "$f" > "$tmp" &&
    [[ -s "$tmp" ]] &&
    mv -- "$tmp" "out$f"
done

You don't need a shell loop or other commands for this, though, just 1 call to awk to process all of your files at once. Using any awk in any shell on every Unix box do only this

awk -v file='random' -F'\t' '
    FNR == 1 {
        close(out)
        f = FILENAME
        sub(".*/","",f)
        out = "/home/user/extra/" file "/" f
    }
    $4 == 12 {
        print > out
    }
' /home/user/data/*.txt

If you want a string instead of numeric comparison so that 12. doesn't match 12 then do $4 == "12" instead of $4 == 12.

In the above file is a poor choice of variable name to hold the name of a directory but I left it alone to avoid changing anything I didn't have to.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.