6

Here is my pandas dataframe, and I would like to flatten. How can I do that ?

The input I have

key column
1 {'health_1': 45, 'health_2': 60, 'health_3': 34, 'health_4': 60, 'name': 'Tom'}   
2 {'health_1': 28, 'health_2': 10, 'health_3': 42, 'health_4': 07, 'name': 'John'}  
3 {'health_1': 86, 'health_2': 65, 'health_3': 14, 'health_4': 52, 'name': 'Adam'}

The expected output

All the health and name will become a column name of their own with their corresponding values. In no particular order.

health_1 health_2 health_3 health_4 name key
45          60       34       60    Tom  1
28          10       42       07    John 2
86          65       14       52    Adam 3
2
  • Please show the expected output. Do you want e.g. 4 rows (health_...) from each source row? Commented Dec 5, 2018 at 14:38
  • @Valdi_Bo not sure if I understood you correctly, basically every row has 5 columns. If that helps you Commented Dec 5, 2018 at 14:44

5 Answers 5

6

You can do it with one line solution,

df_expected = pd.concat([df, df['column'].apply(pd.Series)], axis = 1).drop('column', axis = 1)

Full version:

import pandas as pd
df = pd.DataFrame({"column":[
{'health_1': 45, 'health_2': 60, 'health_3': 34, 'health_4': 60, 'name': 'Tom'}   ,
{'health_1': 28, 'health_2': 10, 'health_3': 42, 'health_4': 7, 'name': 'John'}  ,
{'health_1': 86, 'health_2': 65, 'health_3': 14, 'health_4': 52, 'name': 'Adam'}
]})

df_expected = pd.concat([df, df['column'].apply(pd.Series)], axis = 1).drop('column', axis = 1)
print(df_expected)

DEMO: https://repl.it/repls/ButteryFrightenedFtpclient

Sign up to request clarification or add additional context in comments.

Comments

4

This should work:

df['column'].apply(pd.Series)

Gives:

   health_1  health_2  health_3  health_4  name
0  45        60        34        60        Tom 
1  28        10        42        7         John
2  86        65        14        52        Adam

Comments

2

Try:

pd.concat([pd.DataFrame(i, index=[0]) for i in df.column], ignore_index=True)

Output:

   health_1  health_2  health_3  health_4  name
0        45        60        34        60   Tom
1        28        10        42         7  John
2        86        65        14        52  Adam

Comments

2

The solutions using apply are going overboard. You can create your desired DataFrame using a list of dictionaries like you have in your column Series. You can easily get this list of dictionaries by using the tolist method:

res = pd.concat([df.key, pd.DataFrame(df.column.tolist())], axis=1)
print(res)

   key  health_1  health_2  health_3  health_4  name
0    1        45        60        34        60   Tom
1    2        28        10        42         7  John
2    3        86        65        14        52  Adam

Comments

0

Not sure I understand - This is the default format for a DataFrame?

import pandas as pd
df = pd.DataFrame([
{'health_1': 45, 'health_2': 60, 'health_3': 34, 'health_4': 60, 'name': 'Tom'}   ,
{'health_1': 28, 'health_2': 10, 'health_3': 42, 'health_4': 7, 'name': 'John'}  ,
{'health_1': 86, 'health_2': 65, 'health_3': 14, 'health_4': 52, 'name': 'Adam'}
])

2 Comments

your answer is how I wanted to look like, but unfortunately, I have the columns nested in the row with their values. This is data I get from a really bad server, that I need to convert to a pandas dataframe
the one I am presenting above is what I have. A dataframe with 2 columns, key and column. I would like to unpack the rows, so that each key in the row becomes a column in itself

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.