2

I have a question about processing files in UNIX line by line. What I have right now is this -

Source file:

header-1 header-sub1
field1|field2|field3|field4
field5|field6|field7|field8
header-2
field9|field0|fieldA|fieldB

Now I want to process this file line by line and generate an output file. The header should be appended to the first column of every line until the next header is found. That is in essence the output file should be as below:

Output:

header-1 header-sub1|field1|field2|field3|field4
header-1 header-sub1|field5|field6|field7|field8
header-2|field9|field0|fieldA|fieldB    

The shell script loop that I have with me is this -

while read line 
do
    echo "Line ---> ${line}"
    if [ $line = "header-1" -o $line = "header-2" ]
    then
        first_col=$line
    else
        complete_line=`echo $first_col"|"$line`
        echo "$complete_line" >> out.csv
    fi
done < input.txt

Shouldn't the input file be read line by line and then create an appended "complete line"? The thing is the program will treat header-1 and header-sub1 as two distinct fields and it will not match the complete header line 1. But I know they are on the same line, so they should be considered as a single line. Or maybe I am missing out on the logic and/or syntax somewhere?

Also is there any way I can use sed or awk to create such a file? Thanks in advance for any suggestions.

2 Answers 2

4

You can use this awk:

$ awk 'BEGIN{OFS="|"} /^header/ {h=$0; next} {print h, $0}' file
header-1 header-sub1|field1|field2|field3|field4
header-1 header-sub1|field5|field6|field7|field8
header-2|field9|field0|fieldA|fieldB

Explanation

  • BEGIN{OFS="|"} set the output field separator as |.
  • /^header/ {h=$0; next} if the line starts with header, then store it without printing.
  • {print h, $0} on the rest of the lines, print the stored header first.
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks. I will try this and post the results here.
+1 though instead of using header as regex, you can use NF to store the lines where NF==1 as headers.
Since you split the lines on | it will still consider it as header. But it all depends on OPs data.
Aaaaah sorry, @jaypal I misunderstood your previous comment. I thought you were saying NR==1, while now I see you said NF==1. That makes a lot of sense, yes. I will leave it like it is, as you said because we don't know how OPs data looks like. Thanks master :)
@jaypal I also cannot help checking SO from my phone :) Yes, it initially had FS set but then I saw it is not necessary. Regarding OFS, it is needed in the print h, $0. Otherwise it would print a space instead of |.
|
1

This might work for you (GNU sed):

sed -r '/^header/{h;d};G;s/(.*)\n(.*)/\2|\1/' file

Store the header in the hold space and inserts it before non-header lines.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.