0

I have two two datasets:

df1:

Name        Answers Questions People-reached Reputation  
Alex Gaynor   154        44          ~1.4m     8,871 

df2:

 Project               Total-score Post     
 python                    337      93  
 django-templates          22       4  
 slug                      12       1  
 google-app-engine         8        1  
 django                    235      57  
 clang                     22       2  

Is there any way in Python (pandas or other library) I merge the two dataframe in a way so that df2 becomes new column in df1?

Desired output would be:

Name       Answers     Questions   People-reached    Reputation   Project-details
Alex Gaynor   154        44          ~1.4m             8,871   python 337 93  
                                                              django-templates 22 4   
                                                               slug   12  1  
                                                              google-app-engine 8 1
6
  • Can you show your desired output? Commented Aug 17, 2018 at 3:00
  • @andrew_reece I have added desired output Commented Aug 17, 2018 at 3:14
  • You want the entire df as a string in a new column, all in the first row of df1? Commented Aug 17, 2018 at 3:22
  • @sundance Yes. you are right all in new column and in the first row of df1 Commented Aug 17, 2018 at 3:24
  • Try pd.concat([df1, df2], 1) Commented Aug 17, 2018 at 3:42

2 Answers 2

1

If you need to preserve the columnar structure of the added fields, you can create a column MultiIndex.

If you just need to store the information in df2 as a column in df1, you can make a column that contains a list of df2.values.

Option 1: Preserve column structure

# first merge df1 and df2
df2.index = ["Alex Gaynor"] * len(df2)
merged = df1.merge(df2, left_on="Name", right_index=True)

# now create multi-index columns
top_lvl = df1.columns.tolist() + ["project_details"]*3
bottom_lvl = [" "]*len(df.columns) + df2.columns.tolist()
merged.columns = [top_lvl, bottom_lvl]

merged

          Name Answers Questions People-reached Reputation    project_details  \
                                                                      Project   
0  Alex Gaynor     154        44          ~1.4m      8,871             python   
0  Alex Gaynor     154        44          ~1.4m      8,871   django-templates   
0  Alex Gaynor     154        44          ~1.4m      8,871               slug   
0  Alex Gaynor     154        44          ~1.4m      8,871  google-app-engine   
0  Alex Gaynor     154        44          ~1.4m      8,871             django   
0  Alex Gaynor     154        44          ~1.4m      8,871              clang   


  Total-score Post  
0         337   93  
0          22    4  
0          12    1  
0           8    1  
0         235   57  
0          22    2  

If you really need all the df1 entries below the first row to be blank, you can just do:

merged.iloc[1:, :5] = ""
merged
          Name Answers Questions People-reached Reputation    project_details  \
                                                                      Project   
0  Alex Gaynor     154        44          ~1.4m      8,871             python   
0                                                            django-templates   
0                                                                        slug   
0                                                           google-app-engine   
0                                                                      django   
0                                                                       clang   


  Total-score Post  
0         337   93  
0          22    4  
0          12    1  
0           8    1  
0         235   57  
0          22    2  

Option 2: Just store the df2 information in a column

df1["project_details"] = [df2.values]
df1
          Name  Answers  Questions People-reached Reputation  \
0  Alex Gaynor      154         44          ~1.4m      8,871   

                                     project_details  
0  [[python, 337, 93], [django-templates, 22, 4],...  
Sign up to request clarification or add additional context in comments.

Comments

1

You can make the dataframe into a string and add the value to the first row in a new column:

# make df into string
df_string = df2.to_string(index=False, header=False)

# make new column
df1["project_details"] = np.nan

# add df_string to first row in new column
df1.iloc[0, df1.columns.get_loc('project_details')] = df_string

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.