0

I've started to learn bash and totally stuck with the task. I have a comma separated csv file with records like:

id,location_id,organization_id,service_id,name,title,email,department
1,1,,,Name surname,department1 department2 department3,,
2,1,,,name Surname,department1,,
3,2,,,Name Surname,"department1 department2, department3",, e.t.c.

I need to format it this way: name and surname must start with a capital letter

  • add an email record that consists of the first letter of the name and full surname in lowercase
  • create a new csv with records from the old csv with corrected fields.

I split csv on records using awk ( cause some fields contain fields with a comma between quotes "department1 department2, department3" ).

#!/bin/bash
input="$HOME/test.csv"

exec 0<$input

while read line; do

awk -v FPAT='"[^"]*"|[^,]*' '{ 
  ...
}' $input)

done

inside awk {...} (NF=8 for each record), I tried to use certain field values ($1 $2 $3 $4 $5 $6 $7 $8):

#it doesn't work 

IFS=' ' read -a name_surname<<<$5 # Field 5 match to *name* in heading of csv

# Could I use inner awk with field values of outer awk ($5) to separate the field value of outer awk $5 ? 
# as an example:                                  
# $5="${awk '{${1^}${2^}}' $5}"
# where ${1^} and ${2^} fields of inner awk
  
name_surname[0]=${name_surname[0]^}
name_surname[1]=${name_surname[1]^}
  
$5="${name_surname[0]}' '${name_surname[1]}"

email_name=${name_surname[0]:0:1}
email_surname=${name_surname[1]}
domain='@domain'

$7="${email_name,}${email_surname,,}$domain" # match to field 7 *email* in heading of csv

how to add field values ($1 $2 $3 $4 $5 $6 $7 $8) to array and call function join for each for loop iteration to add record to new csv file?

function join { local IFS="$1"; shift; echo "$*"; }
result=$(join , ${arr[@]})
echo $result >> new.csv  
2
  • You don't need to involve bash (except to call awk once) and you don't need to call awk within awk. Please edit your question to include a minimal reproducible example with concise, testable sample input (showing representative values, not just plaseholder strings like "name" everywhere as in your current sample input) and the expected output given that input and then we can help you. Use [email protected] for the fake email addresses you want added as that's what that domain exists for. Commented Jan 8, 2021 at 23:35
  • Don't use while read loops in shell just to manipulate text btw - see why-is-using-a-shell-loop-to-process-text-considered-bad-practice. Commented Jan 8, 2021 at 23:50

2 Answers 2

2

This may be what you're trying to do (using gawk for FPAT as you already were doing) but without more representative sample input and the expected output it's a guess:

$ cat tst.sh
#!/usr/bin/env bash

awk '
BEGIN {
    OFS = ","
    FPAT = "[^"OFS"]*|\"[^\"]*\""
}
NR > 1 {
    n = split($5,name,/\s*/)
    $7 = tolower(substr(name[1],1,1) name[n]) "@example.com"
    print
}
' "${@:--}"

$ ./tst.sh test.csv
1,1,,,Name surname,department1 department2 department3,[email protected],
2,1,,,name Surname,department1,[email protected],
3,2,,,Name Surname,"department1 department2, department3",[email protected],

I put the awk script inside a shell script since that looks like what you want, obviously you don't need to do that you could just save the awk script in a file and invoke it with awk -f.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the quick response. Could you explain in more detail: "${@:--}" as I understand (in according to link ) this is ${parameter:offset} and @ - this is all positional parameters, but why there is a double dash? When I deleted one dash nothing was changed.
It's not ${parameter:offset}, it's the first item in the list at that link, i.e. ${parameter:-word}. The :- in the middle is the operator. In the same way that x="${@:-foo"} means "set x to the args if any are present, otherwise set x to foo", cmd "${@:--}" means "run cmd on the args if any are present, otherwise run cmd on stdin (-)". It's a common way to write shell scripts that can be called as script file or cat file | script while passing the input to some other internal command.
0

Completely working answer by Ed Morton.

If it may be will be helpful for someone, I added one more checking condition: if in CSV file more than one email address with the same name - index number is added to email local part and output is sent to file

#!/usr/bin/env bash
input="$HOME/test.csv"
exec 0<$input

awk '
BEGIN {
  OFS = ","
  FPAT = "[^"OFS"]*|\"[^\"]*\""
}

(NR == 1) {print} #header of csv
(NR > 1) {

  if (length($0) > 1) { #exclude empty lines
    count = 0
    n = split($5,name,/\s*/)
    email_local_part = tolower(substr(name[1],1,1) name[n])
   
    #array stores emails from csv file
    a[i++] = email_local_part
    
    #find amount of occurrences of the same email address
    for (el in a) {
      ret=match(a[el], email_local_part)
  
      if (ret == 1) { count++ }
    } 

    #add number of occurrence to email address
    if (count == 1) { $7 = email_local_part "@abc.com" }
    else { --count; $7 = email_local_part count "@abc.com" }

    print 
  }
} 
' "${@:--}" > new.csv

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.