Awk With Input File Match and Pattern Search

Question

Sorry but I have never asked a question on a board such as this, please excuse inexperience.

I am trying to take a field from an input file, say field two from abc.txt, and match it in the def.txt. The problem is I also need to match an additional pattern in the def.txt file.

For exapmle, field 2 in abc.txt is "3". And the pattern I want to search for in def.txt is "efg". I need it to return all lines that match pattern "efg" and that contain "3".

As an additional constraint I want it to stop searching after it reaches a certain value, say "END". I have exhausted my efforts to find a simple one liner for this in awk or any variant.

I am befuddled on all of these points, is it ok to ask for help on this as a novice? Any assistance is appreciated, thanks.

Here is the code, which is not working at all: awk 'BEGIN { FS = " " } ;NR==FNR{a[$2]=++i;next} '{if ( $5 in a) && ($0 ~ '/efg/')} {print $0}' abc.txt def.txt

I am trying to achieve 3 things:

Match input file field to def.txt fields
Match a pattern in def.txt
Stop the search when a value is encountered, for example "END".

Hoping for a one line solution if possible, I am just too much of an AWK beginner.

Sample Input 
Abc.txt
1
2
3
4

Def.txt
1 abc
1 efg
1 efg some more data
END
2 ghi
2 efg
2 efg some more data
END
3 jkl
3 efg
3 efg some more data
END

and so on...

Expected Output 
1 efg
1 efg some more data
2 efg
2 efg some more data
3 efg 
3 efg some more data

and with any help to have it stop upon reaching "END." Instead of going through the entire file and printing the subsequent instances of 1 efg, 2 efg, etc.

“3” in abc.txt matches "efg" in def.txt and print lines in both files? are those two files both space separated? which file contains "END"? — Haifeng Zhang
– Haifeng Zhang, Commented Apr 28, 2015 at 19:28
What are you trying to accomplish with '/efg'/? In any case, post some sample input and expected output. — Ed Morton
– Ed Morton, Commented Apr 28, 2015 at 19:50
@haifzhan - The file I need the line from is in def.txt. I am seeking the lines in def.txt which match both "3" in abc.txt and "efg" in def.txt. The "END" statement is also in the file I need to get the result from. Sorry for the lack of detail, I am learning how to post effectively. — question33
– question33, Commented Apr 28, 2015 at 20:06

ghoti · Accepted Answer · 2015-04-28 21:51:09Z

2

There are some obvious concerns with your existing code. You provided:

awk 'BEGIN { FS = " " } ;NR==FNR{a[$2]=++i;next} '{if ( $5 in a) && ($0 ~ '/efg'/)} {print $0}' abc.txt def.txt

I see where you're going with this. I think what you mean is:

awk '

  # Step through first file, recording $2 in an array...
  NR==FNR {
    a[$2];
    next;
  }

  # Hard stop if we get a signal...
  $0 == "END" {
    quit;
  }

  # In the second+ file, test a condition.
  $5 in a && /efg/

' abc.txt def.txt

You can of course compress this into a one liner by removing comments and newlines:

awk 'NR==FNR{a[$2];next} $0=="END"{quit} $5 in a && /efg/' abc.txt def.txt

Notable changes:

The single quotes need to wrap your entire script. One at the start, one at the end, none "inside".
Awk splits by whitespace by default, so the FS may be unnecessary (unless you've got tabs in your fields, in which case you may put the FS back).
You don't need to increment a counter. In awk, if you simply mention an array element, it is "created" without having content, so you can use conditions like $5 in a without wasting so much memory.
The extra if statement was removed. Awk takes condition { statement } patterns. A condition is a condition whether it's in this format or inside an if.
The second element of your condition was shrunk to just a regex. By default, awk will take this to mean "does this regex apply to the current input line".
The print $0 command was removed, because this is the default behaviour if no statement is provided.

edited Apr 28, 2015 at 21:51

answered Apr 28, 2015 at 19:39

ghoti

47.2k8 gold badges71 silver badges109 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

question33 Over a year ago

I am getting an error that says: awk: fatal: can't open source file NR==FNR { a[$2]; next; } $5 in a && /efg/ ' for reading (No such file or directory) I think you meant for me to format the command myself after the second comment. I just am not sure.

ghoti Over a year ago

Ah, silly me. Remove the -f from the line. I've removed it from my answer.

ghoti Over a year ago

Oh, and I added your "hard stop" condition.

question33 Over a year ago

Dear @ghoti, An interesting result with awk: the results aren't given in sequential order. In other words it seems any condition that is met by the array contents is outputted. I was hoping it would perform the test in order of the content of abc.txt. It will output the first result of def.txt it comes across that is present in abc.txt. I wanted it to put the first instance of the first entry in def.txt, then the second, and so on. Please, is there any way that you could possibly help me accomplish that? I thought awk was a line reading command this is a major setback.

Collectives™ on Stack Overflow

Awk With Input File Match and Pattern Search

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related