0

I am quite new with these things and would really need some help with this.

Am trying to make a shell script that will extract data from one or multiple databases, export it to CSV, have that data merged into one file, and apply some formulas to the file like SUM or to check the difference between the numbers. I should be able to update or replace the file as long as the formulas will still get applied to the new file.

What I got so far:

mysql -h host -u user -ppassword -P port 
"query" |tee file1.csv
# I didn't know how to have multiple queries for the same DB
mysql -h host2 -u user2 -ppassword2 -P port 
"query2" |tee file2.csv

sed -i 'li\FILE1' file1.csv #just to add a title
echo '' >> file1.csv #just to add a space at the end
sed -i 'li\FILE2' file2.csv 
echo '' >> file2.csv 
cat file1.csv file2.csv > file.csv

The is an example of how my file.csv looks like but in fact contains more similar cells:

       A         B       C
1   C.Installs      
2   date        
3   2019-02-01  100 
4   2019-02-02  131 
5   2019-02-03  222 
6   2019-02-04  180 
7   2019-02-05  213 
8           
9   A.Installs      
10  Date        
11  2019-02-01  23  
12  2019-02-02  42  
13  2019-02-03  34  
14  2019-02-04  35  
15  2019-02-05  21  

Now everytime I run the shell command it should update/replace the file.csv while maintaining or re-adding the formulas for the specific cells. An example for BEFORE and AFTER:

First run of the shell script:

         A       B      C
1   C.Installs      
2   date        
3   2019-02-01  100 
4   2019-02-02  131 
5   2019-02-03  222 
6   2019-02-04  180 
7   2019-02-05  213 
8               846 #Formula of SUM for the 5 values
9   A.Installs      
10  Date        
11  2019-02-01  23  
12  2019-02-02  42  
13  2019-02-03  34  
14  2019-02-04  35  
15  2019-02-05  21  
16              155 #Formula of SUM for the 5 values
17          
18              691 #Formula of the difference between the two totals

Second run of the Shell script:

        A        B     C
1   C.Installs      
2   date        
3   2019-02-02  131 
4   2019-02-03  222 
5   2019-02-04  180 
6   2019-02-05  213 
7   2019-02-06  158 
8               904 #Formula of SUM for the 5 values
9   A.Installs      
10  Date        
11  2019-02-02  42  
12  2019-02-03  34  
13  2019-02-04  35  
14  2019-02-05  21  
15  2019-02-06  31  
16              163 #Formula of SUM for the 5 values
17          
18              741 #Formula of the difference between the two totals

So I would think that first step is to find a way to apply the formulas to the csv file

So I need to build on top of what I have, maybe something with awk am not sure how to proceed, to be honest totally new at this.

Please keep it simple.

Thanks

5
  • 6
    Forget XLSX for command line tools. It's possible but unnecessarily complicated compared to working with a CSV. So no need to mention XLSX in your question. You apparently already know how to get your database exported to a CSV so there's no need to mention that part either. You can trivially figure out how to call some tool from cron once you have the tool so no need to mention cron here either. So your question boils down to how to update a CSV in some way. edit your question to show concise, testable sample input and expected output and your attempt at doing THAT so we can help you. Commented Feb 5, 2019 at 14:44
  • 1
    Thank you Ed, will do so in the upcoming hours. Appreciate the time you took to reply to my question. Commented Feb 5, 2019 at 16:24
  • Please add example input CSV files and an example of output. Thank you Commented Feb 5, 2019 at 17:05
  • Are you actually trying to store a formula in the csv file, or just the result of applying a formula? Commented Feb 5, 2019 at 23:12
  • either way is fine as long as it has the same effect, which is applying a formula for the same cells on the updated csv file, but I would think to have the formula stored would be more convenient. Commented Feb 6, 2019 at 7:46

1 Answer 1

0

You could use csvkit https://csvkit.readthedocs.io/en/latest/scripts/csvsql.html

Starting from

$ cat one.csv
2019-02-01,100
2019-02-02,131
2019-02-03,222
2019-02-04,180
2019-02-05,213

$ cat two.csv
2019-02-01,23
2019-02-02,42
2019-02-03,34
2019-02-04,35
2019-02-05,21

you could run

#!/bin/bash

# add header
sed -i  '1s/^/data,value\n/' one.csv
sed -i  '1s/^/data,value\n/' two.csv

one=$(csvsql --query "select sum(value) as sumOne from one" one.csv | tail -n +2)

two=$(csvsql --query "select sum(value) as sumOne from two" two.csv | tail -n +2)

echo "$one-$two" | bc

to have 691

Sign up to request clarification or add additional context in comments.

2 Comments

The issue I am encountering with this is that the csv file does not have the columns named, so am getting: UnnamedColumnWarning: Column 1 has no name. Using "b"
hi @CristianTrandafir I have edited the script to start from no header CSV files

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.