Split dataframe based on column values of another dataframe in python

Question

I have the following dataframe:

Date    Country Type    Consumption
01/01/2019     Fr   IE  186
02/01/2019     Fr   IE  131
01/01/2019      Fr  SE  115
02/01/2019     Fr   SE  141
03/01/2019     Fr   SE  158
01/01/2019     Po   DK  208
01/01/2019     Po   IE  150
02/01/2019     Po   IE  136
01/01/2019    Po    SE  210
02/01/2019     Po   SE  195
03/01/2019     Po   SE  160
01/01/2019     Hk   DK  229
01/01/2019     Hk   IE  159
02/01/2019     Hk   IE  210
01/01/2019     Hk   SE  130
02/01/2019     Hk   SE  179
03/01/2019     Hk   SE  143

I want to split it into multiple dataframes by country & type. For example I want to have

df_1:

df_2:

df_3:

df_4:

& so on ...

I created another dataframe

df = pd.DataFrame({
"Country": ["Fr", "Po"],
"Type": ["IE", "SE"]})

because I only want to create new dataframes based on these values in "df"

Used the following code :

#create unique list of names

 UniqueNames = pd.unique(df[['Country','Type']].values.ravel())
 DataFrameDict = {elem : pd.DataFrame for elem in UniqueNames}

 for key in DataFrameDict.keys():
     DataFrameDict[key] = df3[:][df3.Country == key]

But this does not serve the purpose & I am getting dataframes with all type values.

How can this be achieved ?

I also tried the following code :

d = {}
for name, group in df3.groupby(['City','Type']):
    d['group_' + str(name)] = group

But the problem is that it creates dataframes for every unique combination of City & Type while I only need a few combination.

Also the dataframe names are like d["group_('Fr', 'IE')"] d["group_('Fr', 'SE')"]

Can I change these names to much simpler ones like Fr_IE Fr_SE because I need to run many other functions on each of these dataframes

so here’s a hint to get you started, you need groupby with country + type, so lookup pd.DataFrame.groupby — gold_cy
– gold_cy, Commented Dec 18, 2019 at 14:00
Paste the dataframe codes and not the images, please. If we want to reproduce your code we have to write it down, line by line. — powerPixie
– powerPixie, Commented Dec 18, 2019 at 14:06

jsgalarraga · Accepted Answer · 2019-12-19 10:20:12Z

1

Convert the dataframe with the desired values into a list of tuples to be able to loop and filter through it

tuples = [tuple(x) for x in df.values]

Finally, filter the original dataframe with each of the items in the list, here I print each of them but you might want to do something else...

for mytuple in tuples:
    print(original_df[(original_df['Country'] == mytuple[0]) & (original_df['Type'] == mytuple[1])])

To save each dataframe in a new variable you can do it with a list:

my_dfs = [df[(df['Country'] == mytuple[0]) & (df['Type'] == mytuple[1])] for mytuple in tuples]
for my_df in my_dfs:
    print(my_df)

edited Dec 19, 2019 at 10:20

answered Dec 18, 2019 at 15:26

jsgalarraga

5356 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user6016731 Over a year ago

Thanks for sharing this but I will need the consumption column

jsgalarraga Over a year ago

sorry for that, I forgot to change one dataframe, try changing the last statement to: print(df[(df['Country'] == mytuple[0]) & (df['Type'] == mytuple[1])]). Note the change of df2 to df just at the start

user6016731 Over a year ago

This does not give the desired output

jsgalarraga Over a year ago

can you please specify de difference so I can help, from what I understood it gives each dataframe separately with all the fields

jsgalarraga Over a year ago

Save every dataframe in a list first as the edit says

|

stargazer · Accepted Answer · 2019-12-19 13:27:18Z

Given that I understood the question correctly, if you just define the key dataframe df as you did below:

df = pd.DataFrame({
"Country": ["Fr", "Po"],
"Type": ["IE", "SE"]})

you are missing the other combinations like: ['Fr','SE'] and ['Po','IE'].

I solved the problem as below. Hope this helps:

import pandas as pd

# I put your original data in a file called data.txt
# and read it into a dataframe called df_data
df_data = pd.read_csv('data.txt', sep=',')
print(df_data)

# Creating a dataframe of all selected country and type pairs
df_temp = df_data.groupby(['Country', 'Type']).size().reset_index(name='Count')
df = df_temp[df_temp['Country'].isin(['Fr', 'Po']) & df_temp['Type'].isin(['IE', 'SE'])].drop('Count', axis=1)
print(df)

# Then loop through the tuples
tuples = [tuple(x) for x in df.values]
my_dfs = [df_data[(df_data['Country'] == mytuple[0]) & (df_data['Type'] == mytuple[1])] for mytuple in tuples]

for my_df in my_dfs:
    print(my_df)

Collectives™ on Stack Overflow

Split dataframe based on column values of another dataframe in python

2 Answers 2

9 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

9 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related