1

I have two inventory csv, one is the updated version of the other; the new one can have new rows with keys not presents in old one; missing row for keys no more presents; same keys with updated records:

sku nome    prezzo  qty codice 
1   uno       10    1   11111
2   due       10    1   22222
3   tre       10    1   33333
4   quattro   10    1   44444
5   cinque    10    1   55555
10  dieci     10    1   101010

sku nome    prezzo  qty codice 
  1  uno        20    2  11111
  2  due        20    2  22222
  3  tre        20    2  33333
  5  cinque     20    2  55555
 10  dieci      20    2  101010
 11  undici     20    2  111111

with reindex union I can menage to have my desired result:

In [52]: r = b.set_index('sku') \
    ...:       .reindex(pd.Index(a['sku']).union(pd.Index(b['sku']))) \
    ...:       .combine_first(a.set_index('sku').assign(qty=0, prezzo=0)) \
    ...:       .reset_index()


sku     nome  prezzo  qty  codice
0    1      uno      20    2   11111
1    2      due      20    2   22222
2    3      tre      20    2   33333
3    4  quattro       0    0   44444
4    5   cinque      20    2   55555
5   10    dieci      20    2  101010
6   11   undici      20    2  111111

Now, if the new files had same colums + other that aren't presents in old one,the result is right but I have column rearranged; how to keep the column structure of new file?

(new file with new colums structure):

   sku     nome  prezzo  qty  codice   Acolumn     Bcolumn     
0    1      uno      20    2   11111   kkkk
1    2      due      20    2   22222               qwerty
2    3      tre      20    2   33333   mmmm
3    4  quattro       0    0   44444
4    5   cinque      20    2   55555
5   10    dieci      20    2  101010   ssss
6   11   undici      20    2  111111   1a2b3c4d

or

   sku     nome   Acolumn  prezzo  qty  codice     Bcolumn
0    1      uno    kkkkk       20    2   11111
1    2      due                20    2   22222     qwerty
2    3      tre                20    2   33333
3    4  quattro                 0    0   44444
4    5   cinque                20    2   55555
5   10    dieci                20    2  101010
6   11   undici                20    2  111111
5
  • It would be good if you attach some of your code too Commented Nov 14, 2017 at 19:59
  • Question updated Commented Nov 14, 2017 at 20:03
  • @twindad After re-reading your question, it seems all you needed was a reindex operation! Anyway, I've tried to improve your solution. Commented Nov 14, 2017 at 20:04
  • @twindad, can you provide sample data sets with "other columns, that aren't presents in old one" and your desired data set? Commented Nov 14, 2017 at 20:28
  • @MaxU I've edit question with examples. desired dataset is same structure of columns order Commented Nov 14, 2017 at 20:57

1 Answer 1

1

Option 1
I've tried to improve your existing solution. You can use reindex + combine_first + reindex again:

df1 = df1.set_index('sku')
df2 = df2.set_index('sku')

df = df2.reindex(df1.index.union(df2.index), fill_value=0)    
df = df1[['nome', 'codice']].combine_first(df).reindex(columns=df1.columns)

c = df.dtypes == 'float'
df.loc[:, c] = df.loc[:, c].astype(int)

df

   sku     nome  prezzo  qty  codice
0    1      uno      20    2   11111
1    2      due      20    2   22222
2    3      tre      20    2   33333
3    4  quattro       0    0   44444
4    5   cinque      20    2   55555
5   10    dieci      20    2  101010
6   11   undici      20    2  111111

Option 2
Alternatively, substitute combine_first with replace + fillna:

df.nome = df.nome.replace(0, np.nan).fillna(df1.nome)
df.codice = df.codice.replace(0, np.nan).fillna(df1.codice).astype(int)

df.reset_index()

   sku     nome  prezzo  qty  codice
0    1      uno      20    2   11111
1    2      due      20    2   22222
2    3      tre      20    2   33333
3    4  quattro       0    0   44444
4    5   cinque      20    2   55555
5   10    dieci      20    2  101010
6   11   undici      20    2  111111
Sign up to request clarification or add additional context in comments.

4 Comments

I can tell you that only sku is key. 'nome' and 'codice' can be different as 'prezzo' and 'qty'
@twindad yeah but based on your example, if the sku is missing, you want nome and codice preserved. That's why I've selected them for the merge. Does it make sense?
when the file is updated, the rows with the sku no more in stock are deleted from csv. the result file (updated) had to keep these rows but with price and qty set to 0. The sku 11 is also missing (new one presents only in new file)
@twindad My merge solution does not work, so I've replaced with with 2 alternatives.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.