Add missing columns to the dataframes from other dataframes

Question

I have a list of dataframes

dfA:

item   a     A              
A      1     2 
B      1     3         
C      0     4

dfB:

item   a     B
E      1     2
F      0     6

dfC:

item   a     C
G      1     3
H      0     4

I want to add the missing columns to each dataframe

This is what I want : dfA:

item   a     A    B    C           
A      1     2    0    0
B      1     3    0    0 
C      0     4    0    0

dfB:

item   a     A   B    C
E      1     0   2    0
F      0     0   6    0

dfC:

item   a     A   B   C
G      1     0   0   3
H      0     0   0   4

stackoverflow.com/questions/39050539/… hope the link will help — BENY
– BENY, Commented Sep 21, 2017 at 3:49

Kevin Lin · Accepted Answer · 2019-07-05 18:23:12Z

15

1) Take the union of each dataframe's columns.

col_list = list(set().union(dfA.columns, dfB.columns, dfC.columns))
col_list.sort()
['A', 'B', 'C', 'a']

2) Use the reindex function.

dfA2 = dfA.reindex(columns=col_list, fill_value=0)
   A  B  C  a
A  2  0  0  1
B  3  0  0  1
C  4  0  0  0

dfB2 = dfB.reindex(columns=col_list, fill_value=0)
   A  B  C  a
E  0  2  0  1
F  0  6  0  0

dfC2 = dfC.reindex(columns=col_list, fill_value=0)
   A  B  C  a
G  0  0  3  1
H  0  0  4  0

3) You can use reindex to drop, add, or duplicate columns.

dfA3 = dfA.reindex(columns=['C', 'A', 'A', 'D'], fill_value=0)
   C  A  A  D
A  0  2  2  0
B  0  3  3  0
C  0  4  4  0

answered Jul 5, 2019 at 18:23

Kevin Lin

1512 silver badges4 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Vaishali · Accepted Answer · 2017-09-21 03:49:46Z

5

You can create a combined column list like this

col_list = (df1.append([df2,df3])).columns.tolist()

Now add the columns to each dataframe

df1 = df1.loc[:, col_list].fillna(0)
print(df1)

    A   B   C   a   item
0   2   0.0 0.0 1   A
1   3   0.0 0.0 1   B
2   4   0.0 0.0 0   C


df2 = df2.loc[:, col_list].fillna(0)
print(df2)

    A   B   C   a   item
0   0.0 2   0.0 1   E
1   0.0 6   0.0 0   F

df3 = df3.loc[:, col_list].fillna(0)
print(df3)

    A   B   C   a   item
0   0.0 0.0 3   1   G
1   0.0 0.0 4   0   H

answered Sep 21, 2017 at 3:49

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

3 Comments

Connor Dibble Over a year ago

I think this behavior is deprecated (raises KeyError) as of pandas 0.21. Use df.reindex() instead. pandas.pydata.org/pandas-docs/stable/user_guide/…

AAmes Over a year ago

Also, df.append is also depreciated.

Trenton McKinney Over a year ago

This no longer works. See this answer, using .reindex, instead.

piRSquared · Accepted Answer · 2017-09-21 04:58:10Z

Option 1
Align both axes
With functools.partial

from functool import partial

(_, dfA), (dfC, dfB) = list(map(
    partial(dfC.align, fill_value=0),
    dfA.align(dfB, fill_value=0)
))

Option 1B
Align columns only

from functools import partial

(_, dfA), (dfC, dfB) = list(map(
    partial(dfC.align, fill_value=0, axis=1),
    dfA.align(dfB, fill_value=0, axis=1)
))

Option 2
Align both axes
With pd.DataFrame.reindex

from functools import reduce    

lod = [dfA, dfB, dfC]
idx = reduce(pd.Index.union, (d.index for d in lod))
col = reduce(pd.Index.union, (d.columns for d in lod))
dfA, dfB, dfC = (d.reindex(idx, col, fill_value=0) for d in lod)

Option 2B
Align columns only

lod = [dfA, dfB, dfC]
col = reduce(pd.Index.union, (d.columns for d in lod))
dfA, dfB, dfC = (d.reindex(columns=col, fill_value=0) for d in lod)

Setup

dfA = pd.DataFrame(**{
    'columns': ['item', 'a', 'A'],
    'data': [['A', 1, 2], ['B', 1, 3], ['C', 0, 4]],
    'index': [0, 1, 2]})

dfB = pd.DataFrame(**{
    'columns': ['item', 'a', 'B'],
    'data': [['E', 1, 2], ['F', 0, 6]],
    'index': [0, 1]})

dfC = pd.DataFrame(**{
    'columns': ['item', 'a', 'C'],
    'data': [['G', 1, 3], ['H', 0, 4]],
    'index': [0, 1]})

Zero · Accepted Answer · 2017-09-21 03:50:08Z

2

One way using merge by defining the order of dfA, dfB, dfC in reduce operation.

In [1932]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'),
                              [dfA, dfB, dfC]).fillna(0)
Out[1932]:
  item  a  A    B    C
0    A  1  2  0.0  0.0
1    B  1  3  0.0  0.0
2    C  0  4  0.0  0.0

In [1933]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'), 
                  [dfB, dfA, dfC]).fillna(0)
Out[1933]:
  item  a  B    A    C
0    E  1  2  0.0  0.0
1    F  0  6  0.0  0.0

In [1934]: reduce(lambda l,r: pd.merge(l,r,on=['item', 'a'], how='left'),
                  [dfC, dfA, dfB]).fillna(0)
Out[1934]:
  item  a  C    A    B
0    G  1  3  0.0  0.0
1    H  0  4  0.0  0.0

answered Sep 21, 2017 at 3:50

Zero

77.4k22 gold badges154 silver badges154 bronze badges

Comments

Mykola Zotko · Accepted Answer · 2022-10-02 20:07:49Z

0

You can use columns union and difference:

dfA.loc[:, dfB.columns.union(dfC.columns).difference(dfA.columns)] = 0
print(dfA)

Output:

      a  A  B  C
item            
A     1  2  0  0
B     1  3  0  0
C     0  4  0  0

You can also use a for loop:

dfs = [dfA, dfB, dfC]
all_columns = set().union(*[df.columns for df in dfs])
for df in dfs:
    df.loc[:, all_columns.difference(df.columns)] = 0
    print(df, end='\n\n')

Output:

      a  A  B  C
item            
A     1  2  0  0
B     1  3  0  0
C     0  4  0  0

      a  B  A  C
item            
E     1  2  0  0
F     0  6  0  0

      a  C  A  B
item            
G     1  3  0  0
H     0  4  0  0

edited Oct 2, 2022 at 20:07

answered Oct 2, 2022 at 19:34

Mykola Zotko

18.2k7 gold badges88 silver badges91 bronze badges

Collectives™ on Stack Overflow

Add missing columns to the dataframes from other dataframes

5 Answers 5

Comments

3 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related