How to get data from another dataframe based on key values

Question

I have two dataframes

df1

Area-ID-Series	Area-ID	MODE
BSD-1-1800	BSD-1	VM
BSD-2-1800	BSD-2	VM
BSD-3-1800	BSD-3	VM
BSE-1-2100	BSE-1	XM
BSE-1-900	BSE-1	VM
BSE-2-2100	BSE-2	XM
BSE-2-900	BSE-2	VM
BSE-3-2100	BSE-3	XM
BSE-3-900	BSE-3	VM
SMR-1-1800	SMR-1	VM

df2

Area-ID	900	1800	2100
BSD-1
BSD-2
BSD-3
BSE-1
BSE-2
BSE-3
SMR-1

df2 has a key based on df1 but without the "Series", because the "Series" transformed into columns.

I want to get MODE value for each series based on Area-ID, so my desired output is like this

df2

Area-ID	900	1800	2100
BSD-1	-	VM	-
BSD-2	-	VM	-
BSD-3	-	VM	-
BSE-1	VM	-	XM
BSE-2	VM	-	XM
BSE-3	VM	-	XM
SMR-1	-	VM	-

I have tried this code but still no idea how it should be done

df_result = pd.merge(df2,
                    df1[['Area-ID', 'MODE']],
                    on= 'Area-ID',
                    how='left')

I have tried to do this in excel with df1 as "city" the formula look like this

900 series column

=IFNA(VLOOKUP(A2&"-"&$B$1,city!A:C,3,0),"-")

1800 series column

=IFNA(VLOOKUP(A2&"-"&$C$1,city!A:C,3,0),"-")

2100 series column

=IFNA(VLOOKUP(A2&"-"&$D$1,city!A:C,3,0),"-")

it worked, but it takes a lot of time because my data is huge so I try to do it with python instead

Do you really have two data frames or do you just want to transform your df1 to different format? — TiTo
– TiTo, Commented Nov 30, 2022 at 10:32
I have two data frames, each of it is in different level with different amount of data — Algo Rithm
– Algo Rithm, Commented Nov 30, 2022 at 12:00

inquirer · Accepted Answer · 2022-12-06 15:31:48Z

I don't know merge very well. Perhaps there is a way to do based on it. Did it differently.

A 'number' column is created. To do this, the rows of the Area-ID-Series column of dataframe df1 are split() into an array by delimiter using split and expand=True is applied to get the columns. The second column is retrieved.

Further, the dataframe is grouped by df1.index, that is, it is called on each row, where the expression df2['Area-ID'] == x['Area-ID'].values[0] is a mask for indexing rows, and x[' number'].values[0] acts as a column name for df2 filtering. Explicit loc indexing is used, where the row indexes are on the left, the column name is on the right. .values[0] is used to extract the value from the list.

import pandas as pd

df1 = pd.read_csv('df1.csv', header=0)
df2 = pd.read_csv('df2.csv', header=0)

df1['number'] = df1['Area-ID-Series'].str.split('-', expand=True)[2].astype(str)


def my_func(x):
    df2.loc[df2['Area-ID'] == x['Area-ID'].values[0], x['number'].values[0]] = x['MODE'].values[0]


df1.groupby(df1.index).apply(my_func)

print(df2)

Output

  Area-ID  900 1800 2100
0   BSD-1  NaN   VM  NaN
1   BSD-2  NaN   VM  NaN
2   BSD-3  NaN   VM  NaN
3   BSE-1   VM  NaN   XM
4   BSE-2   VM  NaN   XM
5   BSE-3   VM  NaN   XM
6   SMR-1  NaN   VM  NaN

Collectives™ on Stack Overflow

How to get data from another dataframe based on key values

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related