1

I want to create multiple dataframes of names that the same as values in one of the column. I would like this code to work like that:

import pandas as pd

data=pd.read_csv('athlete_events.csv')


Sports = data.Sport.unique()

for S in Sports:
    name=str(S)
    name=data.loc[data['Sport']==S]
4
  • What do you mean by names of dataframes? Commented Jul 31, 2018 at 18:39
  • 2
    "I would like this code to work like that:", like what? Can you show input and expected output please? Refer to minimal reproducible example Commented Jul 31, 2018 at 18:40
  • Do you mean that you would like to create an unique dataframe for each unique value in the Sport column and you would like the variable name for each dataframe to be the same as the Sport value? Commented Jul 31, 2018 at 18:43
  • johnchase Yes, exacly this I wont to have. I know I can iterate dataframe by different type of function but I wont to reorganize it and split to be easier for me to analyse it Commented Jul 31, 2018 at 18:51

2 Answers 2

5

Use a dictionary for organizing your dataframes, and groupby to split them. You can iterate through your groupby object with a dict comprehension.

Example:

>>> data
      Sport  random_data
0    soccer            0
1    soccer            3
2  football            1
3  football            1
4    soccer            4

frames = {i:dat for i, dat in data.groupby('Sport')}

You can then access your frames as you would any other dictionary value:

>>> frames['soccer']
    Sport  random_data
0  soccer            0
1  soccer            3
4  soccer            4

>>> frames['football']
      Sport  random_data
2  football            1
3  football            1
Sign up to request clarification or add additional context in comments.

Comments

0

You can do this by modifying globals() but that's not really adviseable.

for S in Sports:
    globals()[str(S)] = data.loc[data['Sport']==S]    

Below is a self-contained example:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'sport':['football', 'football', 'tennis'],
                           'value':[1, 2, 3]})

In [3]: df
Out[3]: 
      sport  value
0  football      1
1  football      2
2    tennis      3

In [4]: for name in df.sport.unique():
    ...:     globals()[name] = df.loc[df.sport == name]
    ...:     

In [4]: football
Out[4]: 
      sport  value
0  football      1
1  football      2

While this is a direct answer to your question, I would recommend sacul's answer, dictionaries are meant for this (i.e. storing keys and values) and variable names inserted via globals() are usually not a good idea to begin with.

Imagine someone else or yourself in the future reading your code - all of a sudden you are using football like a pd.DataFrame which you have never explicitly defined before - how are you supposed to know what is going on?

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.