1

I'm writing a bash script to read a set of files line by line and perform some edits. To begin with, I'm simply trying to move the files to backup locations and write them out as-is, to test the script is working. However, it is failing to copy the last line of each file. Here is the snippet:

    while IFS= read -r line
    do
            echo "Line is ***$line***"
            echo "$line" >> $POM
    done < $POM.backup

I obviously want to preserve whitespace when I copy the files, which is why I have set the IFS to null. I can see from the output that the last line of each file is being read, but it never appears in the output.

I've also tried an alternative variation, which does print the last line, but adds a newline to it:

    while IFS= read -r line || [ -n "$line" ]
    do
            echo "Line is ***$line***"
            echo "$line" >> $POM
    done < $POM.backup

What is the best way to do this do this read-write operation, to write the files exactly as they are, with the correct whitespace and no newlines added?

5
  • I can see that the last line is being read, as it is output by the echo command. However it does not appear in the new file. Commented Feb 2, 2015 at 17:46
  • Then $POM.backup might be having \r before \n Commented Feb 2, 2015 at 17:47
  • How would that affect writing to the new file? Commented Feb 2, 2015 at 17:48
  • 4
    the POSIX definition of a line is: A sequence of zero or more non- <newline>s plus a terminating <newline>. If the file doesn't terminate by a newline character, the last line is called an incomplete line. Text processing tools are generally not good at processing incomplete lines, as a file containing such a line is not a text file. Commented Feb 2, 2015 at 17:54
  • What's wrong with cp $POM.backup $POM? And when you actually start editing the data, something like sed '<some_commands>' $POM.backup > $POM...? Commented Feb 2, 2015 at 21:05

3 Answers 3

1

The command that is adding the line feed (LF) is not the read command, but the echo command. read does not return the line with the delimiter still attached to it; rather, it strips the delimiter off (that is, it strips it off if it was present in the line, IOW, if it just read a complete line).

So, to solve the problem, you have to use echo -n to avoid adding back the delimiter, but only when you have an incomplete line.

Secondly, I've found that when providing read with a NAME (in your case line), it trims leading and trailing whitespace, which I don't think you want. But this can be solved by not providing a NAME at all, and using the default return variable REPLY, which will preserve all whitespace.

So, this should work:

#!/bin/bash

inFile=in;
outFile=out;

rm -f "$outFile";

rc=0;
while [[ $rc -eq 0 ]]; do
    read -r;
    rc=$?;
    if [[ $rc -eq 0 ]]; then ## complete line
        echo "complete=\"$REPLY\"";
        echo "$REPLY" >>"$outFile";
    elif [[ -n "$REPLY" ]]; then ## incomplete line
        echo "incomplete=\"$REPLY\"";
        echo -n "$REPLY" >>"$outFile";
    fi;
done <"$inFile";

exit 0;

Edit: Wow! Three excellent suggestions from Charles Duffy, here's an updated script:

#!/bin/bash

inFile=in;
outFile=out;

while { read -r; rc=$?; [[ $rc -eq 0 || -n "$REPLY" ]]; }; do
    if [[ $rc -eq 0 ]]; then ## complete line
        echo "complete=\"$REPLY\"";
        printf '%s\n' "$REPLY" >&3;
    else ## incomplete line
        echo "incomplete=\"$REPLY\"";
        printf '%s' "$REPLY" >&3;
    fi;
done <"$inFile" 3>"$outFile";

exit 0;
Sign up to request clarification or add additional context in comments.

3 Comments

This works, but it's a bit hard to read. Using a compound command in the while's conditional might help on that count, perhaps?
Also, see the "APPLICATION USAGE" section of pubs.opengroup.org/onlinepubs/009604599/utilities/echo.html for notes straight from the POSIX spec on echo's portability limitations. Safer to use printf '%s\n' "$REPLY" (or printf '%s' "$REPLY" when no newline is desired) if you want this to work on systems with both plain POSIX echo, XSI-extended echo, and GNU's implementation (which conforms with neither standard).
Also, more efficient to open your output file only once rather than reopening it every time you want to append another line to the end. Just put 3>"$outFile" on the end of your loop, and redirect >&3 every time you want to add a line; not only is this more efficient, but it also means you don't need the rm -f.
0

After review i wonder if :

{
line=
while IFS= read -r line
do
    echo "$line"
    line=
done
echo -n "$line"
} <$INFILE >$OUTFILE

is juts not enough...

Here my initial proposal :

#!/bin/bash

INFILE=$1

if [[ -z $INFILE ]]
then
    echo "[ERROR] missing input file" >&2
    exit 2
fi

OUTFILE=$INFILE.processed

# a way to know if last line is complete or not :
lastline=$(tail -n 1 "$INFILE" | wc -l)

if [[ $lastline == 0 ]]
then
    echo "[WARNING] last line is incomplete -" >&2
fi

# we add a newline ANYWAY if it was complete, end of file will be seen as ... empty.
echo | cat $INFILE - | {
    first=1
    while IFS= read -r line
    do
        if [[ $first == 1 ]]
        then
        echo "First Line is ***$line***" >&2
        first=0
        else
        echo "Next Line is ***$line***" >&2
        echo
        fi
        echo -n "$line" 
    done
} > $OUTFILE

if diff $OUTFILE $INFILE
then
    echo "[OK]"
    exit 0
else
    echo "[KO] processed file differs from input"
    exit 1
fi

Idea is to always add a newline at the end of file and to print newlines only BETWEEN lines that are read.

This should work for quite all text files given they are not containing 0 byte ie \0 character, in which case 0 char byte will be lost.

Initial test can be used to decided whether an incomplete text file is acceptable or not.

Comments

0

Add a new line if line is not a line. Like this:

while IFS= read -r line
do
    echo "Line is ***$line***";
    printf '%s' "$line" >&3;
    if [[ ${line: -1} != '\n' ]]
    then
        printf '\n' >&3;
    fi
done < $POM.backup 3>$POM

2 Comments

echo "\n" will echo the literal characters \ and n on several systems. printf '\n' would be the safer approach. Likewise, printf '%s\n' "$line" will handle content where echo "$line" will (on many systems) mess things up -- like a line containing the literal contents -n.
Also, as I commented on the other answer, reopening the output file for every line is a substantial unneeded performance penalty, rather than just opening it once and reusing the file descriptor.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.