Update two similar csv files with same key using pandas

Question

I have two inventory csv, one is the updated version of the other; the new one can have new rows with keys not presents in old one; missing row for keys no more presents; same keys with updated records:

sku nome    prezzo  qty codice 
1   uno       10    1   11111
2   due       10    1   22222
3   tre       10    1   33333
4   quattro   10    1   44444
5   cinque    10    1   55555
10  dieci     10    1   101010

sku nome    prezzo  qty codice 
  1  uno        20    2  11111
  2  due        20    2  22222
  3  tre        20    2  33333
  5  cinque     20    2  55555
 10  dieci      20    2  101010
 11  undici     20    2  111111

with reindex union I can menage to have my desired result:

In [52]: r = b.set_index('sku') \
    ...:       .reindex(pd.Index(a['sku']).union(pd.Index(b['sku']))) \
    ...:       .combine_first(a.set_index('sku').assign(qty=0, prezzo=0)) \
    ...:       .reset_index()


sku     nome  prezzo  qty  codice
0    1      uno      20    2   11111
1    2      due      20    2   22222
2    3      tre      20    2   33333
3    4  quattro       0    0   44444
4    5   cinque      20    2   55555
5   10    dieci      20    2  101010
6   11   undici      20    2  111111

Now, if the new files had same colums + other that aren't presents in old one,the result is right but I have column rearranged; how to keep the column structure of new file?

(new file with new colums structure):

   sku     nome  prezzo  qty  codice   Acolumn     Bcolumn     
0    1      uno      20    2   11111   kkkk
1    2      due      20    2   22222               qwerty
2    3      tre      20    2   33333   mmmm
3    4  quattro       0    0   44444
4    5   cinque      20    2   55555
5   10    dieci      20    2  101010   ssss
6   11   undici      20    2  111111   1a2b3c4d

or

   sku     nome   Acolumn  prezzo  qty  codice     Bcolumn
0    1      uno    kkkkk       20    2   11111
1    2      due                20    2   22222     qwerty
2    3      tre                20    2   33333
3    4  quattro                 0    0   44444
4    5   cinque                20    2   55555
5   10    dieci                20    2  101010
6   11   undici                20    2  111111

@twindad After re-reading your question, it seems all you needed was a reindex operation! Anyway, I've tried to improve your solution. — cs95
– cs95, Commented Nov 14, 2017 at 20:04
@twindad, can you provide sample data sets with "other columns, that aren't presents in old one" and your desired data set? — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Nov 14, 2017 at 20:28
@MaxU I've edit question with examples. desired dataset is same structure of columns order — twindad
– twindad, Commented Nov 14, 2017 at 20:57

cs95 · Accepted Answer · 2017-11-14 20:21:31Z

1

Option 1
I've tried to improve your existing solution. You can use reindex + combine_first + reindex again:

df1 = df1.set_index('sku')
df2 = df2.set_index('sku')

df = df2.reindex(df1.index.union(df2.index), fill_value=0)    
df = df1[['nome', 'codice']].combine_first(df).reindex(columns=df1.columns)

c = df.dtypes == 'float'
df.loc[:, c] = df.loc[:, c].astype(int)

df

   sku     nome  prezzo  qty  codice
0    1      uno      20    2   11111
1    2      due      20    2   22222
2    3      tre      20    2   33333
3    4  quattro       0    0   44444
4    5   cinque      20    2   55555
5   10    dieci      20    2  101010
6   11   undici      20    2  111111

Option 2
Alternatively, substitute combine_first with replace + fillna:

df.nome = df.nome.replace(0, np.nan).fillna(df1.nome)
df.codice = df.codice.replace(0, np.nan).fillna(df1.codice).astype(int)

df.reset_index()

   sku     nome  prezzo  qty  codice
0    1      uno      20    2   11111
1    2      due      20    2   22222
2    3      tre      20    2   33333
3    4  quattro       0    0   44444
4    5   cinque      20    2   55555
5   10    dieci      20    2  101010
6   11   undici      20    2  111111

edited Nov 14, 2017 at 20:21

answered Nov 14, 2017 at 19:58

cs95

406k106 gold badges744 silver badges798 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

twindad Over a year ago

I can tell you that only sku is key. 'nome' and 'codice' can be different as 'prezzo' and 'qty'

cs95 Over a year ago

@twindad yeah but based on your example, if the sku is missing, you want nome and codice preserved. That's why I've selected them for the merge. Does it make sense?

twindad Over a year ago

when the file is updated, the rows with the sku no more in stock are deleted from csv. the result file (updated) had to keep these rows but with price and qty set to 0. The sku 11 is also missing (new one presents only in new file)

cs95 Over a year ago

@twindad My merge solution does not work, so I've replaced with with 2 alternatives.

Collectives™ on Stack Overflow

Update two similar csv files with same key using pandas

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related