0

There's a bash script I've been working on and within this script at some point, I have been trying to figure out how to process two CSV files at once using awk, which will be used to produce several output files. Shortly, there's a main file which keeps the content to be dispatched to some other output files whose names and the number of records they need to be hold, will be derived from another file. First n records will go to first output file and consequent n+1 to n+k to second one and so on.

To be more clear here's an example of how the main record file might look:

x11,x21
x12,x22
x13,x23
x14,x24
x15,x25
x16,x26
x17,x27
x18,x28
x19,x29

and how the other file might look like:

out_file_name_1,2
out_file_name_2,3
out_file_name_3,4

Then the first output file named as out_file_name_1 should look like:

x11,x21
x12,x22

Then the second output file named as out_file_name_2 should look like:

x13,x23
x14,x24
x15,x25

And the last one should look like:

x16,x26
x17,x27
x18,x28
x19,x29

Hopefully it is clear enough.

2
  • 2
    The description is quite vague. To get useful answers, you will probably need to spell everything out clearly. For example: "there's a main file which keeps the record of a content to be dispatched to some other output files whose names and number of records will be derived from another file." In what format is the "record of content" kept? Precisely how should it be "dispatched"? How will those names and numbers "be derived from another file"? For best results, show a small sample of all the required input files and the resulting output files. Commented Mar 13, 2015 at 0:50
  • 1
    ... and what do the output files look like? Commented Mar 13, 2015 at 0:52

2 Answers 2

1

I wouldn't use Awk for this.

while IFS=, read -u 3 filename lines; do
    head -n "$lines" >"$filename"
done 3<other.csv <main.csv

The read -u to read from a particular file descriptor is not completely portable, I believe, but your question is tagged so I am assuming that is not a problem here.

Demo: http://ideone.com/6FisHT

If you end up with empty files after the first, maybe try to replace the inner loop with additional read statements.

while IFS=, read -u 3 filename lines; do
    for i in $(seq 1 "$lines"); do
        read -r line
        echo "$line"
    done >"$filename"
done 3<other.csv <main.csv
Sign up to request clarification or add additional context in comments.

7 Comments

This seems like a great approach, but when I run it on OP's data the second two output files are empty. Is it different for you?
Yeah, I tested it here before posting, and again now just to confirm; Bash 4.1.5(1)-release (x86_64-pc-linux-gnu), Debian Squeeze.
Cool, I didn't imagine you'd post without verifying first. I'm on Mac OS X, bash 3.2.57. I think it boils down to (head -n 2; head -n 2) < main.csv only outputting two lines for me.
You can work around that with read but it's kind of clunky. I'll update with a suggestion.
|
1

Here's a solution in awk since you asked, but clearly triplee's answer is the nicer approach.

$ cat oak.awk
BEGIN { FS = ","; fidx = 1 }

# Processing files.txt, init parallel arrays with filename and number of records
# to print to each one.
NR == FNR {
    file[NR] = $1
    records[NR] = $2
    next
}

# Processing main.txt. Print record to current file. Decrement number of records to print,
# advancing to the next file when number of records to print reaches 0
fidx in file && records[fidx] > 0 {
    print > file[fidx]
    if (! --records[fidx]) ++fidx
    next
}

# If we get here, either we ran out of files before reading all the records
# or a file was specified to contain zero records    
{ print "Error: Insufficient number of files or file with non-positive number of records"
  exit 1 }


$ cat files.txt
out_file_name_1,2
out_file_name_2,3
out_file_name_3,4

$ cat main.txt
x11,x21
x12,x22
x13,x23
x14,x24
x15,x25
x16,x26
x17,x27
x18,x28
x19,x29

$ awk -f oak.awk files.txt main.txt

$ cat out_file_name_1
x11,x21
x12,x22

$ cat out_file_name_2
x13,x23
x14,x24
x15,x25

$ cat out_file_name_3
x16,x26
x17,x27
x18,x28
x19,x29

2 Comments

Yes, thank you. Actually this was the answer I was looking for but as @tripleee answered it in an elegant way, I agree with you to move on with his answer.
You aren't closing open file handles, so you will run out when you have more than just a handful of files. Some Awk implementations are really constrained in this regard. It was the one problem I wanted to avoid by moving to shell script; but all things counted, it should not be a very major addition to this script (just close the old file when moving to the next one).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.