1

I have some data on each of the first 151 pokemon in 151 different dataframes.

    id  identifier  pokemon_id  stat_id base_stat   local_language_id   name
36  7   Squirtle    7   1   44  9   HP
37  7   Squirtle    7   2   48  9   Attack
38  7   Squirtle    7   3   65  9   Defense
39  7   Squirtle    7   4   50  9   Special Attack
40  7   Squirtle    7   5   64  9   Special Defense
41  7   Squirtle    7   6   43  9   Speed


    id  identifier  pokemon_id  stat_id base_stat   local_language_id   name
18  4   Charmander  4   1   39  9   HP
19  4   Charmander  4   2   52  9   Attack
20  4   Charmander  4   3   43  9   Defense
21  4   Charmander  4   4   60  9   Special Attack
22  4   Charmander  4   5   50  9   Special Defense
23  4   Charmander  4   6   65  9   Speed

What I would really like is one row per pokemon with each stat as a column of a new dataframe. Something like

id    identifier    pokemon_id   HP  Attack    ...
4     Charmander    4            39  52        ...
7     Squirtle      7            44  48        ...

Is there an easy way to do that with a pandas dataframe?

2 Answers 2

3

I believe this will do what you're after:

df.groupby(['id', 'identifier', 'name']).base_stat.first().unstack('name')
Sign up to request clarification or add additional context in comments.

1 Comment

A group/unstack is basically the same as a pivot table.
1

You can use pivot_table:

df = df.pivot_table(index=['id','identifier'], 
                    columns='name', 
                    values='base_stat', 
                    aggfunc='first')

print (df)
name           Attack  Defense  HP  Special Attack  Special Defense  Speed
id identifier                                                             
7  Squirtle        48       65  44              50               64     43

If all DataFrames are in list dfs, use concat with list comprehension:

dfs = [df1, df2]

df = pd.concat([df.pivot_table(index=['id','identifier'], 
                               columns='name', 
                               values='base_stat', 
                               aggfunc='first') for df in dfs])
print (df)
name           Attack  Defense  HP  Special Attack  Special Defense  Speed
id identifier                                                             
7  Squirtle        48       65  44              50               64     43
4  Charmander      52       43  39              60               50     65

Last use reset_index with rename_axis (new in pandas 0.18.0), if use pandas bellow 0.18.0 omit rename_axis and use df.columns.name = None:

df = pd.concat([df.pivot_table(index=['id','identifier'], 
                               columns='name', 
                               values='base_stat', 
                               aggfunc='first') for df in dfs])
                  .reset_index()
                  .rename_axis(None, axis=1)
print (df)
   id  identifier  Attack  Defense  HP  Special Attack  Special Defense  Speed
0   7    Squirtle      48       65  44              50               64     43
1   4  Charmander      52       43  39              60               50     65

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.