bash update huge csv file with values from another large csv file

Question

I need to update selected rows of huge csv file (20M rows) with data from another big csv file (30K rows),

File to be updated is 1.csv 1120120031,55121 1120127295,55115 6135062894,55121 6135063011,55215 4136723818,55215 6134857289,55215 4430258714,55121

Updating file is 2.csv 112012 ,55615 6135062,55414 6135063,55514 995707 ,55721

Such as 1_MOD.csv 1120120031,55621 1120127295,55615 6135062894,55421 6135063011,55515 4136723818,55215 6134857289,55215 4430258714,55121

Modifications:

if $1 in 2.csv matches substring of $1 in 1.csv (rows 1 & 2) then update $2 in 1.csv as per 3rd char in $2 of matched row 2.csv;
Match the Maximum size of strings (rows 3 & 4);
Unmatched rows remain unchanged (rows 5 to 7).

So far I managed to test sed in while loop, but script will take about 31 days to complete. I believe there is a better way, such as awk file 2.csv in Array and update 1.csv with that array, something that I could not do as my Awk knowledge is limited

Thanks

When you have so big files, it will be better to use perl instead, it will be probably much faster. — Vrata Blazek
– Vrata Blazek, Commented Apr 25, 2016 at 22:06

kdhp · Accepted Answer · 2016-04-25 22:42:46Z

1

Using awk, reading in 2.csv, and using the first field as a pattern.

BEGIN {
    FS = " *, *";
    OFS = ",";
}
NR==FNR {
    # Ensure there are no special characters in $1
    if ($1 ~ /^[[:digit:]]+$/)
        a[$1] = substr($2, 3, 1);
    next;
} {
    for (n in a)
        if ($1 ~ "^"n) {
            $2 = substr($2, 1, 2) a[n] substr($2, 4, length($2) - 3);
            break;
        }
} 1

edited Apr 25, 2016 at 22:42

answered Apr 25, 2016 at 22:32

kdhp

2,12416 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jonathan Leffler Over a year ago

You can probably break the for loop after spotting a match, which should save some processing time.

kdhp Over a year ago

@JonathanLeffler Thanks, I added the break.

Hayden Over a year ago

@kdhp and jonathanLeffler, many thanks guys ,in one hour you solved what I could not solve in days. I will run script for some thousands rows to see how long it will take

Collectives™ on Stack Overflow

bash update huge csv file with values from another large csv file

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related