1

I have a file script.sh:

script.sh

cd /folder
mv a.csv result.csv

The a.csv file will have lots of records(GBs) in form:

id,name
1,"platinum"
2,"joe"
1,"platinum"
...

What I want to do is create a file using a script called records.txt, which will have a total no of records and records with unique IDs.

records.txt

Total Records: 3
Unique Records: 2

Total excluding id and name.

I want to do this via script after mv. How can I do it via the script?

2
  • 1
    Please do add your efforts in form of code in your question, which is highly encouraged on SO. Commented Dec 11, 2020 at 13:29
  • 1
    Please add more details to your question. What exactly means "records with unique IDs"? In your example you have two identic lines 1,"platinum". Can the file have two lines with the same ID but different name fields? What is the expected result in this case? Example: two lines 3,"foo" and 3,"bar". Commented Dec 11, 2020 at 13:50

3 Answers 3

1

Use bash sort with unique (why both?) to get the unique values, use wc -l to count those:

#!/bin/bash

total=$(tail -n +2 tst.csv | wc -l)
unique=$(tail -n +2 tst.csv | sort | uniq | wc -l)

echo "Total Records: ${total}"
echo "Unique Records: ${unique}"

Total Records: 2

Unique Records: 3


NOTE; using tail -n +2 to skip the first line of the CSV, since you don't want to count those.

Sign up to request clarification or add additional context in comments.

2 Comments

The problem here is that the file isn't necessarily sorted -- hence, you need to sort the lines before applying uniq.
@costaparas Thanks for pointing that out, I've posted outdated attempt ;) Should be fine now :D
0

Awk alternative:

awk -F, '{ fil[$1]++ } END { for (i in fil) { tot++;if (fil[i] == 1) { utot++ } } print "Total Records: "tot;print "Unique Records: "utot }' results.csv > records.txt

Set the field separator to , and then set up an array with the id as the index and an incremented value. At the end, loop through the array and creating a running total (tot) and a unique total (utot) where the count in the array is one. Print both values.

1 Comment

you could add some newlines for readability
0

Your script.sh could look like this:

cd /folder
mv a.csv result.csv

total=$((`wc -l < result.csv` - 1))
unique=$((`sort result.csv | uniq | wc -l` - 1))

cat > records.txt <<eof
Total Records: $total
Unique Records: $unique
eof

This just uses a simple pair of pipelines to count how many lines there are -- using wc.

Note, we subtract 1 in each case because of the header line.

Also, in the case of the unique count, we use sort followed by uniq to correctly remove the duplicates.

The counts are then exported to the records.txt file.

(Note I've used backticks here purely to avoid having too many parentheses, you can instead use $(...) command substitution syntax, but its not essential here since there's no nesting required).

5 Comments

using $(( numeric_producing_cmds)) mean you can use $( any_cmd) rather than ancient back-tics. So please, $(($(cat result.csv | wc -l) - 1)), and with fewer process invocations $(( $( wc -l < result.csv) - 1)). Good luck to all.
I disagree -- using $(...) inside $((...)) makes it less readable, hence the use of back-ticks in this case. As for the 2nd point, I've updated in the answer to address the inefficiency.
Well, maybe, $(($(cat... is harder to read. I think adding the spaces as in the 2nd example is easier to read. But of course, we're all entitied to our opinions. When you get to the point in scripting where you need to nest cmd-substitution, using back-tics will a challenge (not impossible), just much easier with $( cmd1 $(cmd 2) ) . Good luck to all.
And, as a general answer, the original #!/bin/sh doesn't allow $(( )), only expr and backtics, so this is won't help someone using a true #!/bin/sh, so I can't upvote this.
Of course, for nested commands, you'd go for nesting $(..). But even in that case, using back-ticks on the inner-most nested command, makes it more readable.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.