2

The file /tmp/file.csv contains the following:

name,age,gender
bob,21,m
jane,32,f

The CSV file will always have headers.. but might contain a different number of fields:

id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description

In either case, I want to convert my CSV data into an array of associative arrays..

What I need

So, I want a Bash 4.3 function that takes CSV as piped input and sends the array to stdout:

/tmp/file.csv:

name,age,gender
bob,21,m
jane,32,f

It needs to be used in my templating system, like this:

{{foo | csv_to_array | foo2}}

^ this is a fixed API, I must use that syntax .. foo2 must receive the array as standard input.

The csv_to_array func must do it's thing, so that afterwards I can do this:

$ declare -p row1; declare -p row2; declare -p new_array;

and it would give me this:

declare -A row1=([gender]="m" [name]="bob" [age]="21" )
declare -A row2=([gender]="f" [name]="jane" [age]="32" )
declare -a new_array=([0]="row1" [1]="row2")

..Once I have this array structure (an indexed array of associative array names), I have a shell-based templating system to access them, like so:

{{#new_array}}
  Hi {{item.name}}, you are {{item.age}} years old.
{{/new_array}}

But I'm struggling to generate the arrays I need..

Things I tried:

I have already tried using this as a starting point to get the array structure I need:

while IFS=',' read -r -a my_array; do
    echo ${my_array[0]} ${my_array[1]} ${my_array[2]}
done <<< $(cat /tmp/file.csv)

(from Shell: CSV to array)

..and also this:

cat /tmp/file.csv | while read line; do
  line=( ${line//,/ } )
  echo "0: ${line[0]}, 1: ${line[1]}, all: ${line[@]}" 
done

(from https://www.reddit.com/r/commandline/comments/1kym4i/bash_create_array_from_one_line_in_csv/cbu9o2o/)

but I didn't really make any progress in getting what I want out the other end...

EDIT:

Accepted the 2nd answer, but I had to hack the library I am using to make either solution work..

I'll be happy to look at other answers, which do not export the declare commands as strings, to be run in the current env, but instead somehow hoist the resultant arrays of the declare commands to the current env (the current env is wherever the function is run from).

Example:

$ cat file.csv | csv_to_array
$ declare -p row2 # gives the data 

So, to be clear, if the above ^ works in a terminal, it'll work in the library I'm using without the hacks I had to add (which involved grepping STDIN for ^declare -a and using source <(cat); eval $STDIN... in other functions)...

See my comments on the 2nd answer for more info.

1
  • 1
    if the above ^ works in a terminal the above will never work in any terminal, as the right side of a pipe is run inside subshell. It's not possible to change environment of parent from a subshell. You have to use some external entity, ex. a temporary file to do that, and read that file (and remove it) in your parent shell. Commented Jul 28, 2019 at 14:22

2 Answers 2

2

The approach is straightforward:

  • Read the column headers into an array
  • Read the file line by line, in each line …
    • Create a new associative array and register its name in the array of array names
    • Read the fields and assign them according to the column headers

In the last step we cannot use read -a, mapfile, or things like these since they only create regular arrays with numbers as indices, but we want an associative array instead, so we have to create the array manually.

However, the implementation is a bit convoluted because of bash's quirks.

The following function parses stdin and creates arrays accordingly. I took the liberty to rename your array new_array to rowNames.

#! /bin/bash
csvToArrays() {
    IFS=, read -ra header
    rowIndex=0
    while IFS= read -r line; do
        ((rowIndex++))
        rowName="row$rowIndex"
        declare -Ag "$rowName"
        IFS=, read -ra fields <<< "$line"
        fieldIndex=0
        for field in "${fields[@]}"; do
            printf -v quotedFieldHeader %q "${header[fieldIndex++]}"
            printf -v "$rowName[$quotedFieldHeader]" %s "$field"
        done
        rowNames+=("$rowName")
    done
    declare -p "${rowNames[@]}" rowNames
}

Calling the function in a pipe has no effect. Bash executes the commands in a pipe in a subshell, therefore you won't have access to the arrays created by someCommand | csvToArrays. Instead, call the function as either one of the following

csvToArrays < <(someCommand) # when input comes from a command, except "cat file"
csvToArrays < someFile       # when input comes from a file

Bash scripts like these tend to be very slow. That's the reason why I didn't bother to extract printf -v quotedFieldHeader … from the inner loop even though it will do the same work over and over again.
I think the whole templating thing and everything related would be way easier to program and faster to execute in languages like python, perl, or something like that.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks so much... I am so close.. Sorry to be a pain but I need it working from inside a function. And I can't seem to make it work inside a function, piping the CSV to it, which is what I need. I need to call it like so: cat /tmp/file.csv | csv_to_array But it wont work - when I change the CSV file, re-run the func, the output of declare -p doesn't change... See example: ``` # cat /tmp/file.csv | csv_to_Array row1 row2 # declare -p row2 bash: declare: row2: not found ``` Any ideas? (And sorry, stupid SE wont let me add the func I'm using, too long)
That's because the right-hand side of pipe runs in a subshell (and because my script expects a file, but you used stdin). The arrays only exist inside that subshell. After csv_to_array finishes the subshell is closed and all variables are lost. There is no way for the subshell to modify its parent. Here's a solution: Pack my script into a function and change the first assignment to file="$1". Then call csv_to_array /tmp/file.csv. That's it. No need for a useless use of cat.
Still doesn't work .. And I changed the while read line to for line in .. .. still didn't work..
And I need the func to work with piped input anyway... as that is the only input it will ever receive...
Found the problem. From help declare: "When used in a function, declare makes NAMEs local, as with the local command. The ‘-g’ option suppresses this behavior.". I converted the script into a function that reads stdin for you.
|
0

The following script:

csv_to_array() {
    local -a values
    local -a headers
    local counter

    IFS=, read -r -a headers
    declare -a new_array=()
    counter=1
    while IFS=, read -r -a values; do
        new_array+=( row$counter )
        declare -A "row$counter=($(
            paste -d '' <(
                printf "[%s]=\n" "${headers[@]}"
            ) <(
                printf "%q\n" "${values[@]}"
            )
        ))"
        (( counter++ ))
    done
    declare -p new_array ${!row*}
}

foo2() {
    source <(cat)
    declare -p new_array ${!row*} |
    sed 's/^/foo2: /'
}

echo "==> TEST 1 <=="

cat <<EOF |
id,title,url,description
1,foo name,foo.io,a cool foo site
2,bar title,http://bar.io,a great bar site
3,baz heading,https://baz.io,some description
EOF
csv_to_array |
foo2 

echo "==> TEST 2 <=="

cat <<EOF |
name,age,gender
bob,21,m
jane,32,f
EOF
csv_to_array |
foo2 

will output:

==> TEST 1 <==
foo2: declare -a new_array=([0]="row1" [1]="row2" [2]="row3")
foo2: declare -A row1=([url]="foo.io" [description]="a cool foo site" [id]="1" [title]="foo name" )
foo2: declare -A row2=([url]="http://bar.io" [description]="a great bar site" [id]="2" [title]="bar title" )
foo2: declare -A row3=([url]="https://baz.io" [description]="some description" [id]="3" [title]="baz heading" )
==> TEST 2 <==
foo2: declare -a new_array=([0]="row1" [1]="row2")
foo2: declare -A row1=([gender]="m" [name]="bob" [age]="21" )
foo2: declare -A row2=([gender]="f" [name]="jane" [age]="32" )

The output comes from foo2 function.

The csv_to_array function first reads the headaers. Then for each read line it adds new element into new_array array and also creates a new associative array with the name row$index with elements created from joining the headers names with values read from the line. On the end the output from declare -p is outputted from the function.

The foo2 function sources the standard input, so the arrays come into scope for it. It outputs then those values again, prepending each line with foo2:.

2 Comments

Thank you for the answer. It is 99% what I need, but I'd prefer the generated declare stuff is made available to the parent ENV, using eval or something, rather than exported out as a string to be sourced/evald.. I am having to add source <(cat); eval "$STDIN"; STDIN="${new_array[@]}"; to the start of the other funcs, but shouldn't be allowed (it's a hack)
To be clear: the "receiving" funcs (foo2, fooX, etc) all use read to set $STDIN, which can be row1 row2 row3, as long as something like eval and "echo ${row1[@]}" can be used to access the array data...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.