Adding new columns to dataframe from other dataframe according to list of indices in other dataframe

Question

I have two dataframes, each row in dataframe A has a list of indices corresponding to entries in dataframe B and a set of other values. I want to join the two dataframes in a way so that each of the entries in B has the other values in A where the index of the entry in B is in the list of indices in the entry in A.

So far, I have found a way of extracting the rows in B for the list of indices in each row in A but only row-by-row from this answer but then I am not sure where to go from here? Also not sure if there's a better way of doing it with Pandas dynamically as the size of the list of indices can change.

import pandas as pd
import numpy as np

# Inputs
A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})

print(A)
>>    indices a1   a2
>> 0  [0, 1]  a  100
>> 1  [2, 3]  b  200
>> 2  [4, 5]  c  300

B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})

print(B)
>>     b
>> 0  10
>> 1  20
>> 2  30
>> 3  40
>> 4  50
>> 5  60

# This is the desired output
out = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60],
    "a1": ["a","a", "b", "b", "c", "c"],
    "a2": [100,100,200,200,300,300]
})

print(out)
>>      b a1   a2
>> 0  10  a  100
>> 1  20  a  100
>> 2  30  b  200
>> 3  40  b  200
>> 4  50  c  300
>> 5  60  c  300

Ankur Sinha · Accepted Answer · 2019-07-22 14:03:28Z

5

If you have pandas >=0.25, you can use explode:

C = A.explode('indices')

This gives:

  indices a1   a2
0       0  a  100
0       1  a  100
1       2  b  200
1       3  b  200
2       4  c  300
2       5  c  300

Then do:

output = pd.merge(B, C, left_index = True, right_on = 'indices')
output.index = output.indices.values    
output.drop('indices', axis = 1, inplace = True)

Final Output:

    b a1   a2
0  10  a  100
1  20  a  100
2  30  b  200
3  40  b  200
4  50  c  300
5  60  c  300

edited Jul 22, 2019 at 14:03

answered Jul 22, 2019 at 11:31

Ankur Sinha

6,6648 gold badges44 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Jeff Over a year ago

imo this is the best answer, for people using >=0.25. explode is how you would perform this operation in other frameworks (like spark, for instance). I think in general it's better to use drop over del to remove a column though, correct?

Ankur Sinha Over a year ago

Thanks Jeff! Yes, you are right. Changed it to drop :)

philMarius Over a year ago

I'm going to accept this one because of @Jeff 's comment, it's similar to how other frameworks handle such an operation.

philMarius Over a year ago

@AnkurSinha just realised, the indices are wrong, they should be [0,1,2,3,4,5] not [0,0,1,1,2,2]. Looks like the output dataframe is using C's indices not B's

Ankur Sinha Over a year ago

Or you can also use: output.index = output.indices.values before you drop the output.indices column.

|

iamklaus · Accepted Answer · 2019-07-22 11:54:34Z

3

using pd.merge

df2 = pd.DataFrame(A.set_index(['a1','a2']).indices)

df = pd.DataFrame(df2.indices.values.tolist(), index=a.index).stack().reset_index().drop('level_2', axis=1).set_index(0)

pd.merge(B,df,left_index=True, right_index=True)

Output

    b a1   a2
0  10  a  100
1  20  a  100
2  30  b  200
3  40  b  200
4  50  c  300
5  60  c  300

edited Jul 22, 2019 at 11:54

answered Jul 22, 2019 at 11:25

iamklaus

3,7682 gold badges14 silver badges21 bronze badges

Comments

Josh Friedlander · Accepted Answer · 2019-07-22 11:27:14Z

2

Here you go:

helper = A.indices.apply(pd.Series).stack().reset_index(level=1, drop=True)
A = A.reindex(helper.index).drop(columns=['indices'])
A['indices'] = helper
B = B.merge(A, left_index=True, right_on='indices').drop(columns=['indices']).reset_index(drop=True)

Result:

    b   a1  a2
0   10  a   100
1   20  a   100
2   30  b   200
3   40  b   200
4   50  c   300
5   60  c   300

answered Jul 22, 2019 at 11:27

Josh Friedlander

11.8k7 gold badges42 silver badges89 bronze badges

Comments

philMarius · Accepted Answer · 2019-07-23 09:45:06Z

You can also use melt instead of stack, but it's more complicated as you must drop columns you don't need:

import pandas as pd
import numpy as np

# Inputs
A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})

B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})

AA = pd.concat([A.indices.apply(pd.Series), A], axis=1)
AA.drop(['indices'], axis=1, inplace=True)
print(AA)

   0  1 a1   a2
0  0  1  a  100
1  2  3  b  200
2  4  5  c  300

AA = AA.melt(id_vars=['a1', 'a2'], value_name='val').drop(['variable'], axis=1)
print(AA)

  a1   a2  val
0  a  100    0
1  b  200    2
2  c  300    4
3  a  100    1
4  b  200    3
5  c  300    5

pd.merge(AA.set_index(['val']), B, left_index=True, right_index=True)

Out[8]: 
  a1   a2   b
0  a  100  10
2  b  200  30
4  c  300  50
1  a  100  20
3  b  200  40
5  c  300  60

braintho · Accepted Answer · 2019-07-22 12:03:04Z

1

This solution will handle indices of varying lengths.

A = pd.DataFrame.from_dict({
    "indices": [[0,1],[2,3],[4,5]],
    "a1": ["a","b","c"],
    "a2": [100,200,300]
})
A = A.indices.apply(pd.Series) \
    .merge(A, left_index = True, right_index = True) \
    .drop(["indices"], axis = 1)\
    .melt(id_vars = ['a1', 'a2'], value_name = "index")\
    .drop("variable", axis = 1)\
    .dropna()
A = A.set_index('index')
B = pd.DataFrame.from_dict({
    "b": [10,20,30,40,50,60]
})
B
B.merge(A,left_index=True,right_index=True)

Final Output:

    b   a1  a2
0   10  a   100
1   20  a   100
2   30  b   200
3   40  b   200
4   50  c   300
5   60  c   300

answered Jul 22, 2019 at 12:03

braintho

4013 silver badges8 bronze badges

Collectives™ on Stack Overflow

Adding new columns to dataframe from other dataframe according to list of indices in other dataframe

5 Answers 5

8 Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

8 Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related