1

I have two dataframes

df1

Area-ID-Series Area-ID MODE
BSD-1-1800 BSD-1 VM
BSD-2-1800 BSD-2 VM
BSD-3-1800 BSD-3 VM
BSE-1-2100 BSE-1 XM
BSE-1-900 BSE-1 VM
BSE-2-2100 BSE-2 XM
BSE-2-900 BSE-2 VM
BSE-3-2100 BSE-3 XM
BSE-3-900 BSE-3 VM
SMR-1-1800 SMR-1 VM

df2

Area-ID 900 1800 2100
BSD-1
BSD-2
BSD-3
BSE-1
BSE-2
BSE-3
SMR-1

df2 has a key based on df1 but without the "Series", because the "Series" transformed into columns.

I want to get MODE value for each series based on Area-ID, so my desired output is like this

df2

Area-ID 900 1800 2100
BSD-1 - VM -
BSD-2 - VM -
BSD-3 - VM -
BSE-1 VM - XM
BSE-2 VM - XM
BSE-3 VM - XM
SMR-1 - VM -

I have tried this code but still no idea how it should be done

df_result = pd.merge(df2,
                    df1[['Area-ID', 'MODE']],
                    on= 'Area-ID',
                    how='left')

I have tried to do this in excel with df1 as "city" the formula look like this

900 series column

=IFNA(VLOOKUP(A2&"-"&$B$1,city!A:C,3,0),"-") 

1800 series column

=IFNA(VLOOKUP(A2&"-"&$C$1,city!A:C,3,0),"-") 

2100 series column

=IFNA(VLOOKUP(A2&"-"&$D$1,city!A:C,3,0),"-") 

it worked, but it takes a lot of time because my data is huge so I try to do it with python instead

4
  • Do you really have two data frames or do you just want to transform your df1 to different format? Commented Nov 30, 2022 at 10:32
  • I have two data frames, each of it is in different level with different amount of data Commented Nov 30, 2022 at 12:00
  • @AlgoRithm is the question still relevant? Commented Dec 2, 2022 at 15:47
  • @inquirer yes, still working on the same data Commented Dec 6, 2022 at 11:00

1 Answer 1

1

I don't know merge very well. Perhaps there is a way to do based on it. Did it differently.

A 'number' column is created. To do this, the rows of the Area-ID-Series column of dataframe df1 are split() into an array by delimiter using split and expand=True is applied to get the columns. The second column is retrieved.

Further, the dataframe is grouped by df1.index, that is, it is called on each row, where the expression df2['Area-ID'] == x['Area-ID'].values[0] is a mask for indexing rows, and x[' number'].values[0] acts as a column name for df2 filtering. Explicit loc indexing is used, where the row indexes are on the left, the column name is on the right. .values[0] is used to extract the value from the list.

import pandas as pd

df1 = pd.read_csv('df1.csv', header=0)
df2 = pd.read_csv('df2.csv', header=0)

df1['number'] = df1['Area-ID-Series'].str.split('-', expand=True)[2].astype(str)


def my_func(x):
    df2.loc[df2['Area-ID'] == x['Area-ID'].values[0], x['number'].values[0]] = x['MODE'].values[0]


df1.groupby(df1.index).apply(my_func)

print(df2)

Output

  Area-ID  900 1800 2100
0   BSD-1  NaN   VM  NaN
1   BSD-2  NaN   VM  NaN
2   BSD-3  NaN   VM  NaN
3   BSE-1   VM  NaN   XM
4   BSE-2   VM  NaN   XM
5   BSE-3   VM  NaN   XM
6   SMR-1  NaN   VM  NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.