0

I'm try to merge several dataframes in loop. My code is :

my_dict = {
        '1':{
                'providers_using':{
                        'providers':['prov1','prov2','prov3'],
                        'calls_count':[10,20,30]
                }
            },
        '2':{
                'providers_using':{
                        'providers':['prov1','prov2','prov3'],
                        'calls_count':[100,200,300]
                }
            },
        '3':{
                'providers_using':{
                        'providers':['prov1','prov2','prov3'],
                        'calls_count':[1000,2000,3000]
                }
            }

}



df_joined = pd.DataFrame(columns = ['providers','calls_count'])

for i in range(1,4):
  gw_df = pd.DataFrame.from_dict(my_dict[str(i)]['providers_using'])
  gw_df.rename(columns={"calls_count":"gw" + str(i)})
  df_joined = pd.merge(df_joined,gw_df,left_on='providers',right_on = 'providers',how='outer')

print(df_joined)

Result is:

  calls_count_x providers  calls_count_y  calls_count_x  calls_count_y
0       NaN         prov1        10             100           1000     
1       NaN         prov2        20             200           2000     
2       NaN         prov3        30             300           3000

But i want get names columns "calls_count" as gw1,gw2,gw3, etc. The count of steps in loop maybe random so it's not acceptable just do :

df_joined.columns = ['gw1','gw2','gw3']

Expected output is:

    providers    gw1            gw3           gw3
0   prov1        10             100           1000     
1   prov2        20             200           2000     
2   prov3        30             300           3000

Btw - why columns name are "calls_count_y","calls_count_x" ?

3
  • What is the expected output? Also, you have 5 columns, you cannot set 3 names only Commented Jul 26, 2021 at 14:05
  • Have you searched how to rename a single column? It seems that's the most logical option: rename column every iteration of the loop. Commented Jul 26, 2021 at 14:06
  • 1
    If you ask why the the _x and _y parts, that's in the documentation; see the "suffixes" parameter description. Commented Jul 26, 2021 at 15:44

2 Answers 2

2

Below is the trick. You can also rename the columns. The idea is to create a list of columns as per the name gw + str(i) and create a column list.

col = []
for i in range(1,4):
  col.append("gw" + str(i))
  gw_df = pd.DataFrame.from_dict(my_dict[str(i)]['providers_using'])
  df_joined = pd.merge(df_joined,gw_df,left_on='providers',right_on = 'providers',how='outer')

df_joined.columns = ['providers'] + col
print(df_joined)

Edit : As per the documentation of pandas merge, the suffixes will be _x and _y.

Sign up to request clarification or add additional context in comments.

Comments

1

Here is a way to get the expected output from the initial dictionary without looping:

(pd.concat({k: pd.DataFrame(v).T.apply(pd.Series.explode)  # unnest the lists
            for k,v in my_dict.items()})
   .reset_index()                                          # dict keys as column
   .pivot(index='providers',                               # pivot the calls count
          columns='level_0',
          values='calls_count')
   .add_prefix('gw')                                       # rename columns
   .rename_axis('', axis=1)                                # remove 'level_0' name
   .reset_index()                                          # index as column
)

output:

  providers gw1  gw2   gw3
0     prov1  10  100  1000
1     prov2  20  200  2000
2     prov3  30  300  3000

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.