0

I am trying to filter a file_to_filter by using another filter_file, which is just a list of strings in $1. I think I am close but can not seem to include the header row in the output. The file_to_filter is tab delimited as well. Thank you :).

file_to_filter

Chr Start   End Ref Alt Func.refGene    Gene.refGene
chr1    160098543   160098543   G   A   exonic  ATP1A2
chr1    172410967   172410967   G   A   exonic  PIGC

filter_file

PIGC

desired output (header included)

Chr Start   End Ref Alt Func.refGene    Gene.refGene
chr1    172410967   172410967   G   A   exonic  PIGC

awk with current output (header not included)

awk -F'\t' 'NR==1{A[$1];next}$7 in A' file test

chr1    172410967   172410967   G   A   exonic  PIGC
0

1 Answer 1

2

Assuming your fields really are tab-separated:

awk -F'\t' 'NR==FNR{tgts[$1]; next} (FNR==1) || ($7 in tgts)' filter_file file_to_filter

To start learning awk, read the book Effective Awk Programing, 4th Edition, by Arnold Robbins.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.