bash: How to get one column from CSV file of variable row lenght?

Question

I have 2 CSV files: a.txt contains data and a_props.txt describes type of columns, e.g.:

a.txt:

john,smith,[email protected],30,
peter,jones,27

a_props.txt:

name,surname,email,age
name,surname,age

How can I get one type of data from a.txt according to its index obtained from a_props.txt?

E.g.: age

30,27

or

30
27

it is more normal to split up different file layouts into separate files, so you would need an a.txt and b.txt with corresponding a_props.txt and b_props.txt (Assuming that you're dealing with more than 2 lines of data). Can you refactor the creation of those data files, or is that a hard constraint? You could also build filters to take the a files and turn them into b, c, .... as needed depending on how many layouts you have. Good luck. — shellter
– shellter, Commented Feb 1, 2012 at 18:33

Jakub Stejskal · Accepted Answer · 2012-02-01 23:14:25Z

3

You can use paste to merge the two files line by line and awk to check if there's any match for property name you're looking for:

paste -d, a_props.txt a.txt | awk -v PROP='age' -v FS=',' '{for (i=1; i<=NF/2; i++) if ($i == PROP) print $(NF/2+i)}'

In this example, the output would be:

30
27

Note that you just need to change PROP=<property> to get the value of some other column.

EDIT: Fixed for cases where PROP is not last field of a record.

edited Feb 1, 2012 at 23:14

Jakub Stejskal

6132 gold badges10 silver badges25 bronze badges

answered Feb 1, 2012 at 18:38

jcollado

40.5k9 gold badges108 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Zsolt Botykai Over a year ago

It works only if the properties file has the same number of lines as the data field.

jcollado Over a year ago

@ZsoltBotykai That's correct, this is an assumption I've made based on the example given by the OP.

Jakub Stejskal Over a year ago

Nice solution, thanks. With just a little fix (see EDIT) it does just what I needed ;) print $(i*2) -> print $(NF/2+i)

evil otto · Accepted Answer · 2012-02-01 18:43:35Z

use process substitution and extra FDs to get additional streams to read from, and read the props and data files in parallel:

key=age

exec 9< <(tr , " " < a_props.txt) 10< <( tr , " " < a.txt )

while read -u 9 -a props ; do
  read -u 10 -a data
  for (( ix=0 ; $ix < ${#props[*]} ; ix++ )); do
      if [ "${props[$ix]}" == $key ]; then
          echo ${data[$ix]}
      fi
 done 
done

process substitution is bash specific, and won't work in vanilla sh.

Also, be very careful about what a "csv" file is. Once you add in quoted fields and the like they become much more difficult to parse. At that point I'd use an existing csv package in some other language (e.g., Text::CSV in perl, or the csv package in tcllib)

potong · Accepted Answer · 2012-02-02 00:44:55Z

1

This might work for you:

paste a_props.txt a.txt | 
awk '{split($1,a,",");split($2,b,",");for(x in a){if(a[x]==v)print b[x]}}' v=age

answered Feb 2, 2012 at 0:44

potong

59.3k6 gold badges55 silver badges92 bronze badges

Comments

Zsolt Botykai · Accepted Answer · 2012-02-01 18:43:43Z

awk -F "," '{ a=1 
              while ((getline p < ARGV[2]) > 0) {
                  props[a]=
                  a++
              }
              close(ARGV[2])
            }
            ARGIND > 1 { exit }
            { for (elem in props) {
                   if (length(props[elem]) = NF) {
                       split(props[elem],header,",")
                       for (item in header) {
                           data[header[item]+=$i ","
                       }
                    }
             }
             END { 
                   for (elem in data) {
                       split(gensub(",$","","g",data[elem]),d,",")
                       print elem ":"
                       for ( e in d ) {
                           print d[e]
                       }
                   }
                 }' a.txt a.props.txt

This might work, but I did not tested it. And I would not recommend it with really large files as the scripts slurps them into memory. And what happens if a_props.txt contains two or more lines with the same field length e.g.:

name,age
name,email

This case is not handled in the above script! And the order of arguments to the script is important.

Collectives™ on Stack Overflow

bash: How to get one column from CSV file of variable row lenght?

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related