0

I have 3 data frames in my code...i need to combine all 3 into a single dataframe and save it in a excel sheet. Attached the below 3 data frames, and i wanted to created the new data frame based on asset_id.

I tried as below:

import pickle
import os
frames = [a_dataframe, pandaDf, pandaDf1]
result = pd.concat(frames)

It did not work...Can someone pls help me

dataframe1: enter image description here

dataframe 2: enter image description here

Dataframe 3: enter image description here

I have used the below query...dataframe 2 and datframe 3 is not showig up

output: enter image description here

The final output after merging

final_df=pandaDf.append(pandaDf1, sort=True).merge(a_dataframe, on='asset_id')
final_df


final_df=pandaDf.append(pandaDf1, sort=True).merge(a_dataframe, on='asset_id')
final_df
final_df.head(10).to_dict()

for analysis

after merging

1
  • why did you put the pyspark tag ? Commented Aug 16, 2021 at 14:37

4 Answers 4

3

try this:

a_dataframe['property_name'] = a_dataframe['property_name'].astype(str)
pandaDf['property_name'] = pandaDf['property_name'].astype(str)

a_dataframe['property_value'] = a_dataframe['property_value'].astype(int)
pandaDf['property_value'] = pandaDf['property_value'].astype(int)

final_df = a_dataframe.merge(pandaDf, on='asset_id', how='left')
final_df = final_df.merge(pandaDf1, on='asset_id', how='left')

in one line you can try this:

final_df = pandaDf1.merge(a_dataframe.merge(pandaDf,on='asset_id'),on='asset_id')

EDIT add code by request:

import pandas as pd
  
df1 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
                   'p_value': {0: "HOR", 1: "HOR", 2: "HOR", 3:"HOR"}})

df2 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
                   'p_value': {0: "GRAN", 1: "GRAN", 2: "GRAN", 3:"GRAN"}})


df3 = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3:4},
                   'r_id': {0: "24", 1: "32", 2: "42", 3:"45"}})

final_df = df3.merge(df1.merge(df2, on='id'),on='id')
final_df

output:


id  r_id    p_value_x   p_value_y
0   1   24  HOR GRAN
1   2   32  HOR GRAN
2   3   42  HOR GRAN
3   4   45  HOR GRAN
Sign up to request clarification or add additional context in comments.

8 Comments

@user1740577i haveused the above query...but dataframes are not combined..can u pls help me
@Kanagambala, what is your problem?
can you send this three-row of three dataframe
I have attached the output of the query.given by u..can u pls help mw
i need columns as asset_id, revision_id, Horizon, Granuality
|
1

You can append the first 2 dataframes and then merge the 3rd dataframe, as follows:

final_df = df1.append(df2, sort=True).merge(df3, on='asset_id')

Result:

print(final_df)

   asset_id property_name  property_value  revision_id
0         1       HORIZON              24          314
1         1   GRANULARITY              24          314
2         2       HORIZON              48           12
3         2   GRANULARITY              24           12
4         3       HORIZON              24           24
5         3   GRANULARITY              24           24
6         4       HORIZON              24           43
7         4   GRANULARITY              48           43

Edit

To further make HORIZON and GRANULARITY as column header and put their property_value under the columns, you can do a pivot, as follows:

final_df_pivoted = final_df.pivot(index=['asset_id', 'revision_id'], columns='property_name', values='property_value').rename_axis(columns=None).reset_index()

Result:

print(final_df_pivoted)

   asset_id  revision_id  GRANULARITY  HORIZON
0         1          314           24       24
1         2           12           24       48
2         3           24           24       24
3         4           43           48       24

50 Comments

`/databricks/python/lib/python3.7/site-packages/pandas/core/frame.py:6211: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default. To accept the future behavior, pass 'sort=False'. To retain the current behavior and silence the warning, pass 'sort=True'. sort=sort)
am getting above error..can u pls help me?
@Kanagambala Ok, let me look at it
@Kanagambala Edited my solution. You can try again.
asset_id property_name_x property_value_x revision_id property_name_y property_value_y @SeaBean i am getting output as above
|
1

Solution cited above by user1740577 seems to work.

I created directly the three dataframes in python and tried the solution cited and it worked. here's my code. Try to see if it works on your machine. If it doesn't, it might be the way of loading your excels into DataFrame which can cause an error.

import numpy as np
import pandas as pd
import IPython

df0 = pd.DataFrame(data=[[1, 'HORIZON', 24], 
                     [2, 'HORIZON', 48], 
                     [3, 'HORIZON', 24],
                     [4, 'HORIZON', 24]], columns=['asset_id', 'property_name', 'property_value'])


df1 = pd.DataFrame(data=[[3, 'GRANULARITY', 24],
                     [4, 'GRANULARITY', 48],
                     [1, 'GRANULARITY', 24],
                     [2, 'GRANULARITY', 24]], columns=['asset_id', 'property_name', 'property_value'])


df2 = pd.DataFrame(data=[[4, 24],
                     [2, 43],
                     [3, 314],
                     [1, 12]], columns=['asset_id','revision_id'])


print(df0)
print(df1)
print(df2)

final_df = df0.merge(df1, on='asset_id', how='left')
final_df = final_df.merge(df2, on='asset_id', how='left')

print(final_df)

IPython.embed()

1 Comment

df0 = a_dataframe, columns=['asset_id', 'property_name', 'property_value']) df1 = pandaDf, columns=['asset_id', 'property_name', 'property_value']) df2 = pandaDf1, columns=['asset_id','revision_id']) I have edited the code as below...can u pls help me
0

The answer is correct, but what your data represent or how do you want to combine the dataframes is not clear. In the first dataframe "asset_id" 1 has "property name" 'HORIZON', but in the second dataframe the same "asset_id" has "property_name" 'GRANULARITY'. Confusing. And not possible to combine the dataframes nicely and logically.

Maybe renaming the columns with distinctive names could help in giving helpful and usable answers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.