4

I want to paste two files together, but with vertical alignment after a section delimiter line ::. Here’s what I mean.

Contents of file1:

Apple
Banana
Carrot
::
Durian

Contents of file2:

Energy
Flight
::
Gravity
Heartbreak

Desired output:

Apple Energy
Banana Flight
Carrot 
::
Durian Gravity
 Heartbreak

So far I know paste will almost do what I want (without the nice vertical alignment); another option is to split file1 and file2 into multiple files, then concatenate the results together, but I want to avoid that if I can. How can I do this?

I don’t strictly need a solution that uses paste. Anything that works works!

8
  • 1
    See man column for something that provides this feature. I'm not clear how paste can deal with re-alignment after the :: group separator lines -- we might be into awk territory here. Commented Dec 14, 2022 at 10:31
  • @Paul_Pedant column can't insert blank fields to adjust alignment vertically. Commented Dec 14, 2022 at 13:20
  • @Kusalananda Quite so: the vertical alignment needs to be padded to match up the :: parts before the paste. Also, I recently had a battle with column because it treats multiple separators as one, so fails to columnise empty fields. You need a placeholder for empty fields. Hence awk may be the easiest solution. Commented Dec 14, 2022 at 13:59
  • @Kusalananda Here we are: the column problem described, and my solution in awk for a similar problem. https://unix.stackexchange.com/questions/724928/text-processing-rows-to-columns-for-a-block-of-lines-awk/725043#725043 Commented Dec 14, 2022 at 14:03
  • Okay, it sounds like I should be on the lookout for an awk solution. Should I edit this question to clarify that I’m not strictly asking for a paste solution, or is that clear already? Commented Dec 14, 2022 at 14:38

2 Answers 2

2

Using any awk:

$ cat tst.awk
BEGIN {
    blockSep = "::"
}
FNR == 1 {
    numBlocks[++fileNr] = 0
}
(FNR == 1) || ($0 == blockSep) {
    numLines[fileNr,++numBlocks[fileNr]] = 0
}
$0 != blockSep {
    vals[fileNr,numBlocks[fileNr],++numLines[fileNr,numBlocks[fileNr]]] = $0
}
END {
    maxBlocks = ( numBlocks[1] > numBlocks[2] ? numBlocks[1] : numBlocks[2] )
    for ( blockNr=1; blockNr<=maxBlocks; blockNr++ ) {
        maxLines = ( numLines[1,blockNr] > numLines[2,blockNr] ? numLines[1,blockNr] : numLines[2,blockNr] )
        for ( lineNr=1; lineNr<=maxLines; lineNr++ ) {
            print vals[1,blockNr,lineNr], vals[2,blockNr,lineNr]
        }
        if ( blockNr < maxBlocks ) {
            print blockSep
        }
    }
}

$ awk -f tst.awk file1 file2
Apple Energy
Banana Flight
Carrot
::
Durian Gravity
 Heartbreak
2
  • Thank you, works great!! Could you add what assumptions this makes about the file contents? (For example: whether the string “::” appears anywhere else besides the delimiter lines, whether spaces appear in each line, and whether the two files have the same number of sections.) Commented Dec 16, 2022 at 18:04
  • 1
    No assumptions that spring to mind, I expect it'll work for any input files of any number of blocks (sections) per file or any numbers of lines per block with any spaces anywhere and ::s appearing anywhere else. Oh, one assumption I suppose is that you want to get all the output for both input files, not truncate either of them if one is smaller than the other in any way. If that's wrong in some way just change the > to < in the maxBlocks and/or maxLines calculations ternary expressions as appropriate. Commented Dec 16, 2022 at 18:29
1

This is an extremely hackish solution that works with your example data because the two files have only the :: line in common and because the files only contain single words (no spaces). I would therefore consider it extremely fragile and not at all generic.

It parses the side-by-side diff output, so it is diff that does the vertical alignment.

$ diff -y file1 file2 | awk -v OFS='\t' 'NF == 3 { print $1, $3; next } $2 == "<" { print $1; next } $1 == ">" { print "", $2; next } { print $1 }'
Apple   Energy
Banana  Flight
Carrot
::
Durian  Gravity
        Heartbreak

The awk code that parses the diff -y output outputs selected parts of each line depending on the number of fields and the contents of certain parts of the data.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.