2

I have a test.txt file looking like this :

a,1,A
b,2,B
c,3,C
d,4,D
e,5,E
f,6,F

I want to modify the second field with some condition :

if value is 1 I modify it to 1_PLUS

if value is 4 I modify it to 4_PLUS

if value is 6 I modify it to 6_PLUS

otherwise modify it to an empty field.

The final file will look like this :

a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,5,E
f,6_PLUS,F

I wrote a bash script test.sh to do the substitution :

ITEM=$1
case $ITEM in
  1)
    LOC=1_PLUS
    ;;
  4)
    LOC=4_PLUS
    ;;
  6)
    LOC=6_PLUS
    ;;
  *)
    LOC=
    ;;
esac
echo $LOC

Then I launch the command like this : I give the $2 argument to my test.sh script to do the substitution and modify the $2 in awk with this new value.

cat test.txt | awk -F, '{$2=$(system("bash ./test.sh "$2))}'

The result is :

1_PLUS


4_PLUS 
 
6_PLUS

So I think I'm close to the solution but I don't understand why modifying the second field with $2=(result of my bash script) doesn't work

I need to keep the cat test.txt | first because in real life I have a longer command...

Thanx for your help

2
  • This might help: Assigning system command's output to variable Commented Jul 18 at 14:38
  • will all modifications consist of appending the same string (_PLUS in this case) to the 2nd field? if you need to provide a different suffix based on the 2nd field's values then consider updating the question to demonstrate such an example; do you need to dynamically designate the field #, the field values and/or the suffix ... if 'yes' then update the question with these details and add some examples Commented Jul 18 at 16:28

7 Answers 7

2

To "modify a column with awk and a bash script":

awk 'BEGIN{FS=OFS=","} {cmd="bash ./test.sh " $2; cmd | getline $2; close(cmd); print}' test.txt

Output:

a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

If you use system() its output is printed automatically. That's not what you want here.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, you are the only one who understood my need as you mentionned in your header : "modify a column with awk and a bash script ! And your solution works...I need to dig it to well understand what does the "getline" part...
2

You can do it all just with AWK. This sample uses Bash just to glue things together:

#!/usr/bin/env bash

set -euo pipefail

declare -r awk_program='
  $2  ~ /^[146]$/ { $2 = $2 "_PLUS" } # add suffix if it matches
  $2 !~ /^[146]$/ { $2 = ""         } # remove if it does not match
                  { print $0        } # print the modified line
'

awk -F, -vOFS=, "$awk_program" <<SAMPLE
a,1,A
b,2,B
c,3,C
d,4,D
e,5,E
f,6,F
SAMPLE

Outputs:

a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

4 Comments

If would be safer to anchor the regex (/^[146]$/, etc), otherwise a value like 12345 is processed by first case when it probably shouldn't be
@jhnc you are right, I've adjusted the snippet. Thanks!
Thanx, I will try this solution too
Sorry I dont see the cat I need in the first part of the command neither the bash script in the command...
2

Using any awk:

$ awk 'BEGIN{FS=OFS=","} {$2 = (($2 ~ /^[146]$/) ? $2"_PLUS" : "")} 1' test.txt
a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

If you WERE going to implement this as a shell script called from awk (but DO NOT DO THIS as it requires a lot more code, more complicated code, and will run orders of magnitude slower than doing it all in a single awk script as it'll spawn a subshell for every input line) as you intended then the syntax to do that robustly would be:

$ cat test.sh
#!/usr/bin/env bash

item=$1
case $item in
  1)
    loc=1_PLUS
    ;;
  4)
    loc=4_PLUS
    ;;
  6)
    loc=6_PLUS
    ;;
  *)
    loc=
    ;;
esac
printf '%s\n' "$loc"

$ awk '
    BEGIN { FS=OFS="," }
    {
        cmd = "./test.sh \047" $2 "\047"
        if ( (cmd | getline line) > 0 ) {
            $2 = line
        }
        close(cmd)
        print
    }
' test.txt
a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

References:

  1. Please read why-is-using-a-shell-loop-to-process-text-considered-bad-practice to learn some of the reasons not to do this using a shell script, and awk is not a shell so don't use it to call other tools unless there's a very specific reason to do so (which is not the case here).
  2. See https://awk.freeshell.org/AllAboutGetline (or its archive if that site is unavailable) for information on why I'm calling to use getline that way though I could have just done cmd | getline $2 in this case since we want $2 unchanged if/when getline fails.
  3. See Correct Bash and shell script variable capitalization for why I made your shell variables lower case.
  4. The \047s (single quote escape sequences) around $2 are to ensure that $2 is quoted when passed to the shell, see https://mywiki.wooledge.org/Quotes.
  5. I used printf instead of echo in the shell to ensure it'll work robustly, see why-is-printf-better-than-echo, though it's not strictly necessary given your scripts few possible output values.
  6. I added a shebang at the top of the shell script to ensure which shell it runs in and to allow you to modify the shell by changing PATH, see Why is #!/usr/bin/env bash superior to #!/bin/bash?.

1 Comment

Upvoted because your answer is similar than Cyrus's one. Thanx
1

Using awk , check whether the second field is either 1 or 4 or 6 and if so append _PLUS to it, otherwise unset it:

$ awk 'BEGIN {FS=OFS=","}{if ($2 == "1" || $2 == "4" || $2 == "6"){$2 = $2"_PLUS"} else {$2=""}}1' file
a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

2 Comments

It doesn't fit my needs : awk and bash script. Thank you anyway
So you downvote a working answer?
1

I would harness GNU AWK for this task following way, let file.txt content be

a,1,A
b,2,B
c,3,C
d,4,D
e,5,E
f,6,F

then

awk 'BEGIN{FS=OFS=",";arr[1];arr[4];arr[6]}{$2=($2 in arr)?$2"_PLUS":"";print}' file.txt

gives output

a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

Explanation: I inform GNU AWK to use comma as both field separator and output field separator, then I place numbers, which should be replace with number_PLUS as keys of array arr. For each line I use so-called ternary operator condition?valueiftrue:valueiffalse and check for presence of 2nd field value ($2) in keys of arrays arr, if is there I put that value concatenated with _PLUS otherwise empty string, then I print whole line.

(tested in GNU Awk 5.3.1)

1 Comment

It doesn't fit my needs : awk and bash script. Thank you anyway
-1

Use this Perl one-liner:

perl -F',' -lane 'BEGIN { %val = map { $_ => "${_}_PLUS" } qw( 1 4 6 ); } print join ",", $F[0], $val{ $F[1] }, $F[2];' test.txt > out.txt

To modify the input file in-place, use:

perl -i.bak -F',' -lane 'BEGIN { %val = map { $_ => "${_}_PLUS" } qw( 1 4 6 ); } print join ",", $F[0], $val{ $F[1] }, $F[2];' test.txt

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-F',' : Split into @F on comma, rather than on whitespace. -F implicitly sets both "-a" and "-n".
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.

BEGIN { ... } : Execute the code ... before iterating over the input file.
qw( 1 4 6 ) : Array of 3 elements, same as (1, 4, 6).
%val = map { $_ => "${_}_PLUS" } qw( 1 4 6 ); : Create hash %val, with keys being 1, 4, 6, and values being "1_PLUS", "4_PLUS", "6_PLUS".
$val{ $F[1] } : Change the 2nd element of array @F (array in Perl are 0-indexed) using the hash %val, so 1 becomes 1_PLUS, etc. Note that the values for the keys not listed in %val will be undef, which in string context evaluates to an empty string.

See also:

1 Comment

It doesn't fit my needs : awk and bash script. Thank you anyway
-1

For a simple task of "find and replace", on a small quantity of data, there is no really need for an external tool. Pure bash will most likely be faster (if speed is important for you). So here is a pure bash solution for this (could be improved, even simplified, but a good enough starting point).

set -eu

# Contains the element to look for and the value
# to replace them with
declare -A DICT
DICT[1]=1_PLUS
DICT[4]=4_PLUS
DICT[6]=6_PLUS

FIELD_SEPARATOR=","

while read _line; do
    # Split the line into an array
    IFS=$FIELD_SEPARATOR read -ra _line_fields <<< "$_line"

    _field_value=${_line_fields[1]}

    # Check if the value of the line is in the dictionnary
    # :- means use empty string if not defined
    if [[ -n ${DICT[$_field_value]:-} ]]; then
        # Replace the old value with the new value
        _line_fields[1]=${DICT[$_field_value]}
    else
        _line_fields[1]=""
    fi
    # You may find a more elegant solution to print
    # the final output
    printf '%s%s%s%s%s%s\n' ${_line_fields[0]} $FIELD_SEPARATOR\
        ${_line_fields[1]} $FIELD_SEPARATOR\
        ${_line_fields[2]}
done < input.txt

$ bash test.sh
a,1_PLUS,A
b,,B
c,,C
d,4_PLUS,D
e,,E
f,6_PLUS,F

3 Comments

The while read _line will strip any leading/trailing white space and backslashes and the unquoted variables in the printf will interpret wildcards, split into separate lines at white space, etc. Also see correct-bash-and-shell-script-variable-capitalization and consider the relatively tiny amount of awk code needed to do this same job so even if a bash script was faster for tiny input it wouldn't be worth writing it vs an awk script unless you were calling it thousands of times in a loop.
It doesn't fit my needs : awk and bash script. Thank you anyway
You don't need awk. It is just an external program. It's like saying I need C++ and awk.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.