awk load one file into array, test against another file

Question

I have two files:

seqs.fa:

>seq000007;size=72768;
ACTGTGAG
>seq000010;size=53132;
GTAAGATC
GAATTCTT
>seq00045;size=40321;
ACCCATTT
...

numbers.txt

72768
53132

my desired output would be the lines from the first file that match a number from the second file:

>seq000007;size=72768;
>seq000010;size=53132;

I attempted to use awk, but it only returns lines matching the first number:

awk -F"\n" -v RS=">" 'NR==FNR{for(i=1;i<=NF;i++) A[$i]; next} END {for (header in A) {if ( match(header,$1) ) {print header}}}'  seqs.fa numbers.txt

seq000007;size=72768;
seq072768;size=1;

Why is awk only looping through the "header" array for the first line in numbers.txt? And, if this is an XY problem, is there a better way to accomplish this goal?

karakfa · Accepted Answer · 2016-05-03 21:24:23Z

2

after fixing the typo in your numbers file

$ awk -F'=|;' 'NR==FNR{a[$1]; next}; $3 in a' numbers.txt seqs.fa

>seq000007;size=72768;
>seq000010;size=53132;

answered May 3, 2016 at 21:24

karakfa

67.8k8 gold badges45 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

elsherbini Over a year ago

thanks, edited question to remove typo. this gives the desired output. any ideas why my awk command above doesn't work?

karakfa Over a year ago

you have to match $1 in header not the other way around, but it's an inefficient approach.

elsherbini Over a year ago

I think that's what I'm doing, the call is match(string, regex) (unlike match functions I'm used to in python) source

karakfa Over a year ago

right, perhaps it's your record structure then. In your second file there will be only one record.

Lars Fischer · Accepted Answer · 2016-05-03 21:22:27Z

0

In this special case you can use GNU grep like this:

grep -F -f numbers.txt seqs.fa

The option -f filename uses all the patterns found in filename for the search. The options -F tells grep, that the patterns are simple fixed strings.

answered May 3, 2016 at 21:22

Lars Fischer

10.4k3 gold badges31 silver badges38 bronze badges

1 Comment

karakfa Over a year ago

note that this will match any occurrence of sub strings in the file.

Collectives™ on Stack Overflow

awk load one file into array, test against another file

2 Answers 2

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related