0

So what I'm trying to do in my code is basically read in a spreadsheet that has this format

username,   lastname,   firstname,    x1,      x2,       x3,      x4
user1,       dudette,    mary,         7,       2,                 4
user2,       dude,       john,         6,       2,        4,
user3,       dudest,     rad,
user4,       dudaa,      pad,          3,       3,        5,       9

basically, it has usernames, the names those usernames correspond to, and values for each x. What I want to do is read in this from a csv file and then find all of the blank spaces and fill them in with 5s. My approach to doing this was to read in the whole array and then substitute all null spaces with 0s. This is the code so far...

#!/bin/bash

while IFS=$'\t' read -r -a myarray
do
echo $myarray
done < something.csv

for e in ${myarray[@]
do
echo 'Can you see me #1?'
if [[-z $e]]
echo 'Can you see me #2?'
sed 's//0'
fi
done

The code isn't really changing my csv file at all. EDITED NOTE: the data is all comma separated.

What I've figured out so far:

Okay, the 'Can you see me' and the echo myarray are test code. I wanted to see if the whole csv file was being read in from echo myarray (which according to the output of the code seems to be the case). It doesn't seem, however, that the code is running through the for loop at all...which I can't seem to understand.

Help is much appreciated! :)

6
  • Why are there commas after x1, x2, and x3, but none elsewhere? Commented Mar 10, 2014 at 4:09
  • there are supposed to be! sorry! Commented Mar 10, 2014 at 4:15
  • So is it comma separated, tab separated, or whitespace separated? That makes a big difference for a shell implementation. Commented Mar 10, 2014 at 4:18
  • It's all comma separated. Commented Mar 10, 2014 at 4:18
  • Originally my code had while IFS=, read -a line in it...and I think that's the right answer ultimately...but when it outputted the array in the test code, only the column was outputted. After some googling, I found IFS = '/t' while read -r -a and tried that. It seemed to work for the output of the array...but nothing happened to my csv file. Commented Mar 10, 2014 at 4:21

3 Answers 3

1

The format of your .csv file is not comma separated, it's left aligned with a non-constant number of whitespace characters separating each field. This makes it difficult to be accurate when trying to find and replace empty columns which are followed by non-empty columns.

Here is a Bash only solution that would be entirely accurate if the fields were comma separated.

#!/bin/bash

n=5
while IFS=, read username lastname firstname x1 x2 x3 x4; do
    ! [[ $x1 ]] && x1=$n
    ! [[ $x2 ]] && x2=$n
    ! [[ $x3 ]] && x3=$n
    ! [[ $x4 ]] && x4=$n
    echo $username,$lastname,$firstname,$x1,$x2,$x3,$x4
done < something.csv > newfile.csv && mv newfile.csv something.csv

Output:

username,lastname,firstname,x1,x2,x3,x4
user1,dudette,mary,7,2,5,4
user2,dude,john,6,2,4,5
user3,dudest,rad,5,5,5,5
user4,dudaa,pad,3,3,5,9
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your response! As with my code, for some bizarre reason, running your code in my computer doesn't seem to change the csv file at all. I don't know...maybe there's something up with my terminal? I've tried all sorts of approaches to this but they don't seem to work. Does it change your csv file?
Works fine for me: it outputs to newfile.csv, however. If it didn't, then it would clobber the input file something.csv as it is being read in, which is a bad thing. You could always output to newfile.csv and, after that entire process is finished, mv newfiles.csv something.csv.
Would you happen to know how I could modify this code for a variable number of columns?
0

I realize you asked for bash, but if you don't mind perl in lieu of bash, perl is a great tool for record-oriented files.

#!/usr/bin/perl 
open (FILE, 'something.csv');   
open (OUTFILE, '>outdata.txt'); 
while(<FILE>) {         
        chomp;          
        ($username,$lastname,$firstname,$x1,$x2,$x3,$x4) = split("\t");
        $x1 = 5 if $x1 eq "";
        $x2 = 5 if $x2 eq "";
        $x3 = 5 if $x3 eq "";
        $x4 = 5 if $x4 eq "";
        print OUTFILE "$username\t$lastname\t$x1\t$x2\t$x3\t$x4\n";
}
close (FILE);
close (OUTFILE);
exit;

This reads your infile, something.csv which is assumed to have tab-separated fields, and writes a new file outdata.txt with the re-written records.

6 Comments

Thank you so much for your response! I really appreciate it. Unfortunately, my project specifications state that it has to be shell script. Would you know how I translate this to bash?
You've said it has to be Bash, not Perl, but you've used sed. What are you allowed and not allowed use? I could see an easy awk solution, for example, but I don't know whether awk is allowed (like sed) or not (like Perl).
awk is allowed. But it has to be bash script.
Also, unless if I'm horribly mistaken, I believe sed is also in bash. At least my terminal doesn't seem to be complaining about it...
Sed isn't “in bash”. It's a separate executable for its own language (a very compact stream-oriented editing language).
|
0

I'm sure there's a better or more idiomatic solution, but this works:

#!/bin/bash

infile=bashcsv.csv     # Input filename
declare -i i           # Iteration variable
declare -i defval=5    # Default value for missing cells
declare -i n_cells=7   # Total number of cells per line
declare -i i_start=3   # Starting index for numeric cells
declare -a cells       # Array variable for cells

# We'd usually save/restore the old value of IFS, but there's no need here:
IFS=','

# Convenience function to bail/bug out on error:
bail () {
    echo $@ >&2
    exit 1
}

# Strip whitespace and replace empty cells with `$defval`:
sed -s 's/[[:space:]]//g' $infile | while read -a cells; do

    # Skip empty/malformed lines:
    if [ ${#cells[*]} -lt $i_start ]; then
        continue
    fi

    # If there are fewer cells than $n_cells, pad to $n_cells
    # with $defval; if there are more, bail:
    if [ ${#cells[*]} -lt $n_cells ]; then
        for ((i=${#cells[*]}; $i<$n_cells; i++)); do
            cells[$i]=$defval
        done
    elif [ ${#cells[*]} -gt $n_cells ]; then
        bail "Too many cells."
    fi

    # Replace empty cells with default value:
    for ((i=$i_start; $i<$n_cells; i++)); do
        if [ -z "${cells[$i]}" ]; then
            cells[$i]=$defval
        fi
    done

    # Print out whole line, interpolating commas back in:
    echo "${cells[*]}"
done

Here's a gratuitous awk one-liner that gets the job done:

awk -F'[[:space:]]*,[[:space:]]*' 'BEGIN{OFS=","} /,/ {NF=7; for(i=4;i<=7;i++) if($i=="") $i=5; print}' infile.csv

2 Comments

Emmet, thank you so much! The first solutions seems to result in some odd compiler errors. The second solution with awk works great though! Do you know how I can get the awk output to simply replace the input file, so that I wind up with a csv with the 5s filled in? I tried simply piping it in > but it the output is very weird. It seems to put ,,,0,0,0,0,0,0,0 below each row...
If you have a very recent GNU awk, it can do inplace edits, but even then it's probably better to use output redirection to write to a temporary file and then copy that over the original (be careful not to use a static file name if you can have more than one copy of your script running at once). As a general rule, you can't redirect to and from the same file. Unfortunately, there are a few limited cases where it works (more by luck than by design), and sometimes people will stumble on one of these and think it works in general, but it's far better to avoid it completely.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.