2

I have multiple dataframes and would like a dataframe that contains all column names from said multiple dataframes.

For example :

# Existing Dataframes
df1 =
    df1_colA  df1_colB  df1_colC
0   1         2         3
1   4         5         6
2   7         8         9

df2 =
    df2_colA  df2_colB  df3_colC
0   10        11        12
1   13        14        15
2   16        17        18

df3 =
    df3_colA  df3_colB  df3_colC
0   30        31        32
1   33        34        35
2   36        37        38

I would like to get a dataframe like this :

names =
     df_names   col_names
0    df1        df1_colA
1    df1        df1_colB
2    df1        df1_colC
3    df2        df2_colA
4    df2        df2_colB
5    df2        df2_colC
6    df3        df3_colA
7    df3        df3_colB
8    df3        df3_colC

Help would be very appreciated and thank you in advance!

4 Answers 4

2

If possible extract DataFrame names fom columns names use list comprehension with concat and last for new column in first position use DataFrame.insert with Series.str.extractSeries.str.extractall for values from columnsnames before _:

dfs = [df1, df2, df3]
df = pd.concat([df.columns.to_frame(name='col_names') for df in dfs], ignore_index=True)
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df3  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

Similar ide with flatten list comprehension:

dfs = [df1, df2, df3]
df = pd.DataFrame({'col_names': [x for df in dfs for x in df.columns]})
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df3  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

Alternative is create dictionary of DataFrames and in dict comprehension use concat, keys of dict create first level of MultiIndex, so not necessary parse columns names:

dfs = {'df1':df1, 'df2':df2, 'df3':df3}
df = (pd.concat({k:v.columns.to_frame(name='col_names') for k, v in dfs.items()})
        .droplevel(1)
        .rename_axis('df_names')
        .reset_index())

print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df2  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC
Sign up to request clarification or add additional context in comments.

Comments

1
dfs = [df1, df2, df3]
df = pd.DataFrame({'col_names': pd.concat(dfs).columns})
df['df_names'] = df['col_names'].str.split('_').str[0]
print(df)

Output:

  col_names df_names
0  df1_colA      df1
1  df1_colB      df1
2  df1_colC      df1
3  df2_colA      df2
4  df2_colB      df2
5  df2_colC      df2
6  df3_colA      df3
7  df3_colB      df3
8  df3_colC      df3

Comments

0

You can try

dfs = [df1, df2, df3]

df = (pd.DataFrame({'col_names': [df.columns.tolist() for df in dfs]})
      .explode('col_names', ignore_index=True)
      .pipe(lambda df: df.assign(df_names=df['col_names'].str.split('_').str[0])))
print(df)

  col_names df_names
0  df1_colA      df1
1  df1_colB      df1
2  df1_colC      df1
3  df2_colA      df2
4  df2_colB      df2
5  df3_colC      df3
6  df3_colA      df3
7  df3_colB      df3
8  df3_colC      df3

If the order matters

df.insert(0, 'df_names', df.pop('df_names'))
print(df)

  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df3  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

1 Comment

df = df[['df_names', 'col_names']] is probably an easier way of swapping the column orders in this case~
0

One option is to append the columns (they are indexes), and repeat the df_names with the lengths of the columns for each dataframe, before creating a final dataframe:

dfs = [df1, df2, df3]

col_names = df1.columns.append([df.columns for df in dfs[1:]])

lengths = [len(df) for df in dfs] # or [*map(len, dfs)]

# only useful if you have lots of dataframes
# else, it is just easier to write ['df1', 'df2', 'df3']
df_names = [f"df{num+1}" for num, _ in enumerate(dfs)]

df_names = np.repeat(df_names, lengths)

df = {'df_names' : df_names, 'col_names': col_names}

pd.DataFrame(df, copy = False)


  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df2  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.