1

I have 2 CSV files: a.txt contains data and a_props.txt describes type of columns, e.g.:

a.txt:

john,smith,[email protected],30,
peter,jones,27

a_props.txt:

name,surname,email,age
name,surname,age

How can I get one type of data from a.txt according to its index obtained from a_props.txt?

E.g.: age

30,27

or

30
27
1
  • it is more normal to split up different file layouts into separate files, so you would need an a.txt and b.txt with corresponding a_props.txt and b_props.txt (Assuming that you're dealing with more than 2 lines of data). Can you refactor the creation of those data files, or is that a hard constraint? You could also build filters to take the a files and turn them into b, c, .... as needed depending on how many layouts you have. Good luck. Commented Feb 1, 2012 at 18:33

4 Answers 4

3

You can use paste to merge the two files line by line and awk to check if there's any match for property name you're looking for:

paste -d, a_props.txt a.txt | awk -v PROP='age' -v FS=',' '{for (i=1; i<=NF/2; i++) if ($i == PROP) print $(NF/2+i)}'

In this example, the output would be:

30
27

Note that you just need to change PROP=<property> to get the value of some other column.

EDIT: Fixed for cases where PROP is not last field of a record.

Sign up to request clarification or add additional context in comments.

3 Comments

It works only if the properties file has the same number of lines as the data field.
@ZsoltBotykai That's correct, this is an assumption I've made based on the example given by the OP.
Nice solution, thanks. With just a little fix (see EDIT) it does just what I needed ;) print $(i*2) -> print $(NF/2+i)
1

use process substitution and extra FDs to get additional streams to read from, and read the props and data files in parallel:

key=age

exec 9< <(tr , " " < a_props.txt) 10< <( tr , " " < a.txt )

while read -u 9 -a props ; do
  read -u 10 -a data
  for (( ix=0 ; $ix < ${#props[*]} ; ix++ )); do
      if [ "${props[$ix]}" == $key ]; then
          echo ${data[$ix]}
      fi
 done 
done 

process substitution is bash specific, and won't work in vanilla sh.

Also, be very careful about what a "csv" file is. Once you add in quoted fields and the like they become much more difficult to parse. At that point I'd use an existing csv package in some other language (e.g., Text::CSV in perl, or the csv package in tcllib)

Comments

1

This might work for you:

paste a_props.txt a.txt | 
awk '{split($1,a,",");split($2,b,",");for(x in a){if(a[x]==v)print b[x]}}' v=age

Comments

0
awk -F "," '{ a=1 
              while ((getline p < ARGV[2]) > 0) {
                  props[a]=
                  a++
              }
              close(ARGV[2])
            }
            ARGIND > 1 { exit }
            { for (elem in props) {
                   if (length(props[elem]) = NF) {
                       split(props[elem],header,",")
                       for (item in header) {
                           data[header[item]+=$i ","
                       }
                    }
             }
             END { 
                   for (elem in data) {
                       split(gensub(",$","","g",data[elem]),d,",")
                       print elem ":"
                       for ( e in d ) {
                           print d[e]
                       }
                   }
                 }' a.txt a.props.txt

This might work, but I did not tested it. And I would not recommend it with really large files as the scripts slurps them into memory. And what happens if a_props.txt contains two or more lines with the same field length e.g.:

name,age
name,email

This case is not handled in the above script! And the order of arguments to the script is important.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.