I need to update selected rows of huge csv file (20M rows) with data from another big csv file (30K rows),
File to be updated is 1.csv
1120120031,55121
1120127295,55115
6135062894,55121
6135063011,55215
4136723818,55215
6134857289,55215
4430258714,55121
Updating file is 2.csv
112012 ,55615
6135062,55414
6135063,55514
995707 ,55721
Such as 1_MOD.csv
1120120031,55621
1120127295,55615
6135062894,55421
6135063011,55515
4136723818,55215
6134857289,55215
4430258714,55121
Modifications:
- if $1 in 2.csv matches substring of $1 in 1.csv (rows 1 & 2) then update $2 in 1.csv as per 3rd char in $2 of matched row 2.csv;
- Match the Maximum size of strings (rows 3 & 4);
- Unmatched rows remain unchanged (rows 5 to 7).
So far I managed to test sed in while loop, but script will take about 31 days to complete. I believe there is a better way, such as awk file 2.csv in Array and update 1.csv with that array, something that I could not do as my Awk knowledge is limited
Thanks