Modify a column with awk and a bash script

Question

I have a test.txt file looking like this :

a,1,A
b,2,B
c,3,C
d,4,D
e,5,E
f,6,F

I want to modify the second field with some condition :

if value is 1 I modify it to 1_PLUS

if value is 4 I modify it to 4_PLUS

if value is 6 I modify it to 6_PLUS

otherwise modify it to an empty field.

The final file will look like this :

a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,5,E
f,6_PLUS,F

I wrote a bash script test.sh to do the substitution :

ITEM=$1
case $ITEM in
  1)
    LOC=1_PLUS
    ;;
  4)
    LOC=4_PLUS
    ;;
  6)
    LOC=6_PLUS
    ;;
  *)
    LOC=
    ;;
esac
echo $LOC

Then I launch the command like this : I give the $2 argument to my test.sh script to do the substitution and modify the $2 in awk with this new value.

cat test.txt | awk -F, '{$2=$(system("bash ./test.sh "$2))}'

The result is :

1_PLUS


4_PLUS 
 
6_PLUS

So I think I'm close to the solution but I don't understand why modifying the second field with $2=(result of my bash script) doesn't work

I need to keep the cat test.txt | first because in real life I have a longer command...

Thanx for your help

This might help: Assigning system command's output to variable — Cyrus
– Cyrus, Commented Jul 18 at 14:38
will all modifications consist of appending the same string (_PLUS in this case) to the 2nd field? if you need to provide a different suffix based on the 2nd field's values then consider updating the question to demonstrate such an example; do you need to dynamically designate the field #, the field values and/or the suffix ... if 'yes' then update the question with these details and add some examples — markp-fuso
– markp-fuso, Commented Jul 18 at 16:28

Cyrus · Accepted Answer · 2025-07-18 19:05:09Z

2

To "modify a column with awk and a bash script":

awk 'BEGIN{FS=OFS=","} {cmd="bash ./test.sh " $2; cmd | getline $2; close(cmd); print}' test.txt

Output:

a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

If you use system() its output is printed automatically. That's not what you want here.

edited Jul 18 at 19:05

answered Jul 18 at 15:05

Cyrus

90.2k15 gold badges112 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Fabrice Jul 21 at 8:31

Thank you, you are the only one who understood my need as you mentionned in your header : "modify a column with awk and a bash script ! And your solution works...I need to dig it to well understand what does the "getline" part...

Ionuț G. Stan · Accepted Answer · 2025-07-18 15:46:06Z

2

You can do it all just with AWK. This sample uses Bash just to glue things together:

#!/usr/bin/env bash

set -euo pipefail

declare -r awk_program='
  $2  ~ /^[146]$/ { $2 = $2 "_PLUS" } # add suffix if it matches
  $2 !~ /^[146]$/ { $2 = ""         } # remove if it does not match
                  { print $0        } # print the modified line
'

awk -F, -vOFS=, "$awk_program" <<SAMPLE
a,1,A
b,2,B
c,3,C
d,4,D
e,5,E
f,6,F
SAMPLE

Outputs:

a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

edited Jul 18 at 15:46

answered Jul 18 at 14:38

Ionuț G. Stan

180k19 gold badges196 silver badges206 bronze badges

4 Comments

jhnc Jul 18 at 15:30

If would be safer to anchor the regex (/^[146]$/, etc), otherwise a value like 12345 is processed by first case when it probably shouldn't be

Ionuț G. Stan Jul 18 at 15:46

@jhnc you are right, I've adjusted the snippet. Thanks!

Fabrice Jul 21 at 8:43

Thanx, I will try this solution too

Fabrice Jul 21 at 8:53

Sorry I dont see the cat I need in the first part of the command neither the bash script in the command...

Ed Morton · Accepted Answer · 2025-07-19 13:03:54Z

Using any awk:

$ awk 'BEGIN{FS=OFS=","} {$2 = (($2 ~ /^[146]$/) ? $2"_PLUS" : "")} 1' test.txt
a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

If you WERE going to implement this as a shell script called from awk (but DO NOT DO THIS as it requires a lot more code, more complicated code, and will run orders of magnitude slower than doing it all in a single awk script as it'll spawn a subshell for every input line) as you intended then the syntax to do that robustly would be:

$ cat test.sh
#!/usr/bin/env bash

item=$1
case $item in
  1)
    loc=1_PLUS
    ;;
  4)
    loc=4_PLUS
    ;;
  6)
    loc=6_PLUS
    ;;
  *)
    loc=
    ;;
esac
printf '%s\n' "$loc"

$ awk '
    BEGIN { FS=OFS="," }
    {
        cmd = "./test.sh \047" $2 "\047"
        if ( (cmd | getline line) > 0 ) {
            $2 = line
        }
        close(cmd)
        print
    }
' test.txt
a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

References:

Please read why-is-using-a-shell-loop-to-process-text-considered-bad-practice to learn some of the reasons not to do this using a shell script, and awk is not a shell so don't use it to call other tools unless there's a very specific reason to do so (which is not the case here).
See https://awk.freeshell.org/AllAboutGetline (or its archive if that site is unavailable) for information on why I'm calling to use getline that way though I could have just done cmd | getline $2 in this case since we want $2 unchanged if/when getline fails.
See Correct Bash and shell script variable capitalization for why I made your shell variables lower case.
The \047s (single quote escape sequences) around $2 are to ensure that $2 is quoted when passed to the shell, see https://mywiki.wooledge.org/Quotes.
I used printf instead of echo in the shell to ensure it'll work robustly, see why-is-printf-better-than-echo, though it's not strictly necessary given your scripts few possible output values.
I added a shebang at the top of the shell script to ensure which shell it runs in and to allow you to modify the shell by changing PATH, see Why is #!/usr/bin/env bash superior to #!/bin/bash?.

Upvoted because your answer is similar than Cyrus's one. Thanx

Paolo · Accepted Answer · 2025-07-18 14:29:15Z

1

Using awk , check whether the second field is either 1 or 4 or 6 and if so append _PLUS to it, otherwise unset it:

$ awk 'BEGIN {FS=OFS=","}{if ($2 == "1" || $2 == "4" || $2 == "6"){$2 = $2"_PLUS"} else {$2=""}}1' file
a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

answered Jul 18 at 14:29

Paolo

26.7k8 gold badges51 silver badges88 bronze badges

2 Comments

Fabrice Jul 21 at 8:43

It doesn't fit my needs : awk and bash script. Thank you anyway

Paolo Jul 21 at 11:38

So you downvote a working answer?

Daweo · Accepted Answer · 2025-07-18 14:53:14Z

1

I would harness GNU AWK for this task following way, let file.txt content be

a,1,A
b,2,B
c,3,C
d,4,D
e,5,E
f,6,F

then

awk 'BEGIN{FS=OFS=",";arr[1];arr[4];arr[6]}{$2=($2 in arr)?$2"_PLUS":"";print}' file.txt

gives output

a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

Explanation: I inform GNU AWK to use comma as both field separator and output field separator, then I place numbers, which should be replace with number_PLUS as keys of array arr. For each line I use so-called ternary operator condition?valueiftrue:valueiffalse and check for presence of 2nd field value ($2) in keys of arrays arr, if is there I put that value concatenated with _PLUS otherwise empty string, then I print whole line.

(tested in GNU Awk 5.3.1)

answered Jul 18 at 14:53

Daweo

38.2k3 gold badges17 silver badges32 bronze badges

1 Comment

Fabrice Jul 21 at 8:41

It doesn't fit my needs : awk and bash script. Thank you anyway

Timur Shtatland · Accepted Answer · 2025-07-18 15:12:50Z

Use this Perl one-liner:

perl -F',' -lane 'BEGIN { %val = map { $_ => "${_}_PLUS" } qw( 1 4 6 ); } print join ",", $F[0], $val{ $F[1] }, $F[2];' test.txt > out.txt

To modify the input file in-place, use:

perl -i.bak -F',' -lane 'BEGIN { %val = map { $_ => "${_}_PLUS" } qw( 1 4 6 ); } print join ",", $F[0], $val{ $F[1] }, $F[2];' test.txt

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-F',' : Split into @F on comma, rather than on whitespace. -F implicitly sets both "-a" and "-n".
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.

BEGIN { ... } : Execute the code ... before iterating over the input file.
qw( 1 4 6 ) : Array of 3 elements, same as (1, 4, 6).
%val = map { $_ => "${_}_PLUS" } qw( 1 4 6 ); : Create hash %val, with keys being 1, 4, 6, and values being "1_PLUS", "4_PLUS", "6_PLUS".
$val{ $F[1] } : Change the 2nd element of array @F (array in Perl are 0-indexed) using the hash %val, so 1 becomes 1_PLUS, etc. Note that the values for the keys not listed in %val will be undef, which in string context evaluates to an empty string.

3 Comments

Ed Morton Jul 19 at 12:19

The while read _line will strip any leading/trailing white space and backslashes and the unquoted variables in the printf will interpret wildcards, split into separate lines at white space, etc. Also see correct-bash-and-shell-script-variable-capitalization and consider the relatively tiny amount of awk code needed to do this same job so even if a bash script was faster for tiny input it wouldn't be worth writing it vs an awk script unless you were calling it thousands of times in a loop.

Fabrice Jul 21 at 8:33

It doesn't fit my needs : awk and bash script. Thank you anyway

Itération 122442 Jul 21 at 19:27

You don't need awk. It is just an external program. It's like saying I need C++ and awk.

Collectives™ on Stack Overflow

Modify a column with awk and a bash script

7 Answers 7

1 Comment

4 Comments

1 Comment

2 Comments

1 Comment

See also:

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

1 Comment

4 Comments

1 Comment

2 Comments

1 Comment

See also:

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related