0

I need to update selected rows of huge csv file (20M rows) with data from another big csv file (30K rows),

File to be updated is 1.csv 1120120031,55121 1120127295,55115 6135062894,55121 6135063011,55215 4136723818,55215 6134857289,55215 4430258714,55121

Updating file is 2.csv 112012 ,55615 6135062,55414 6135063,55514 995707 ,55721

Such as 1_MOD.csv 1120120031,55621 1120127295,55615 6135062894,55421 6135063011,55515 4136723818,55215 6134857289,55215 4430258714,55121

Modifications:

  1. if $1 in 2.csv matches substring of $1 in 1.csv (rows 1 & 2) then update $2 in 1.csv as per 3rd char in $2 of matched row 2.csv;
  2. Match the Maximum size of strings (rows 3 & 4);
  3. Unmatched rows remain unchanged (rows 5 to 7).

So far I managed to test sed in while loop, but script will take about 31 days to complete. I believe there is a better way, such as awk file 2.csv in Array and update 1.csv with that array, something that I could not do as my Awk knowledge is limited

Thanks

1
  • 1
    When you have so big files, it will be better to use perl instead, it will be probably much faster. Commented Apr 25, 2016 at 22:06

1 Answer 1

1

Using awk, reading in 2.csv, and using the first field as a pattern.

BEGIN {
    FS = " *, *";
    OFS = ",";
}
NR==FNR {
    # Ensure there are no special characters in $1
    if ($1 ~ /^[[:digit:]]+$/)
        a[$1] = substr($2, 3, 1);
    next;
} {
    for (n in a)
        if ($1 ~ "^"n) {
            $2 = substr($2, 1, 2) a[n] substr($2, 4, length($2) - 3);
            break;
        }
} 1
Sign up to request clarification or add additional context in comments.

3 Comments

You can probably break the for loop after spotting a match, which should save some processing time.
@JonathanLeffler Thanks, I added the break.
@kdhp and jonathanLeffler, many thanks guys ,in one hour you solved what I could not solve in days. I will run script for some thousands rows to see how long it will take

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.