0

I have a file with following 3 columns

1   a1  abcd
2   b1  acdb
3   c1  abcd 

I need to extract/print rows based on substring of column3 on position filter (2=="b"), so the output should be

1   a1  abcd
3   c1  abcd 

Based on (Print substring of column in awk based on filter) I have tried:

awk -F '\t' -v OFS='\t' '{ $3=substr($3,2,1); print $0 }' a.txt 
2
  • 1
    The question you linked to is quite unclear, but asks about an entirely different topic; it's about replacing part of a field with a smaller substring - and indeed, that's what your attempt also does. Commented Oct 11, 2024 at 4:56
  • not replacing it to extract certain rows based on substring filter. Thank you for letting me know , I will frame better next time. Commented Oct 11, 2024 at 9:16

3 Answers 3

7

You may use this awk:

awk -F '\t' 'substr($3, 2, 1) == "b"' file

1   a1  abcd
3   c1  abcd
Sign up to request clarification or add additional context in comments.

5 Comments

may I know is it possible to write awk -F '\t' 'substr($3,2=="b",1) file ? . thank you
That's not valid syntax, no. 2=="b" could never be true and the arithmetic result of that boolean is 0, not 2.
@PanduC just curious, have you ever seen a programming language where substr($3,2=="b",1) is valid syntax? If so, which language and what did it mean in that language?
@Ed Morton and AKA : to be frank I am new and trying to learn, so I asked in case if it is possible to use 2=="b" , the reason I asked is because awk '$3=="b"' as an example. I think I am looking like Daweo response. I thanks to all for your answers/responses.
@PanduC context is extremely important in programming. You can write 2=="b" just like you can write $3=="b" but what either of them means/does depends on the context in which you use them. Using either of them as the 2nd argument to substr() means they'd be evaluated as a condition with the resulting value of 1 for true or 0 for false which means your code would execute as substr($3,0,1) (but 0 is an invalid value for the 2nd field so would be treated as 1) or substr($3,1,1) so either way it's the first char of $3. See substr() in the awk man page for more info.
2

Extending anubhava's answer to allow for dynamic designation of the various parameters:

# column = 3, position = 2, value = "b"

$ awk -F'\t' -v col=3 -v pos=2 -v val=b 'substr($col,pos,1)==val' a.txt
1   a1  abcd
3   c1  abcd

# column = 2, position = 1, value = "a"

$ awk -F'\t' -v col=2 -v pos=1 -v val=a 'substr($col,pos,1)==val' a.txt
1   a1  abcd

Or a more compact input in return for performing a split:

$ awk -F'\t' -v params=3,2,b 'BEGIN { split(params,p,",") } substr($p[1],p[2],1)==p[3]' a.txt
1   a1  abcd
3   c1  abcd

$ awk -F'\t' -v params=2,1,a 'BEGIN { split(params,p,",") } substr($p[1],p[2],1)==p[3]' a.txt
1   a1  abcd

Comments

2

Your code

awk -F '\t' -v OFS='\t' '{ $3=substr($3,2,1); print $0 }' a.txt

does change 3rd field to 2nd character of 3rd field and does print changed line. You are using action without pattern, whilst you should be using pattern without action (so default action of print line as-is would be applied). You can use substr function as already shown or use regular expression at 3rd field as follows, let file.tsv content be

1   a1  abcd
2   b1  acdb
3   c1  abcd

then

awk 'BEGIN{FS="\t"}$3~/^.b/' file.tsv

gives output

1   a1  abcd
3   c1  abcd

Explanation: I inform GNU AWK that fields are sheared by TAB character (note that this is not strictly necessary for your example file as lines do not contain white-space characters other than TAB) then I use pattern to find lines where 3rd line starts with (^) any character (.) followed by b. This solution assumes that position of letter in 3rd field is etched in stone, if this is not case do not use this solution.

(tested in GNU Awk 5.1.0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.