Splitting dataframe based on multiple column values

Question

I have a dataframe with 1M+ rows. A sample of the dataframe is shown below:

df

    ID      Type        File    
 0  123     Phone       1       
 1  122     Computer    2       
 2  126     Computer    1

I want to split this dataframe based on Type and File. If the total count of Type is 2 (Phone and Computer), total number of files is 2 (1,2), then the total number of splits will be 4.

In short, total splits is as given below:

 total_splits=len(set(df['Type']))*len(set(df['File']))

In this example, total_splits=4. Now, I want to split the dataframe df in 4 based on Type and File.

So the new dataframes should be:

df1 (having data of type=Phone and File=1)
df2 (having data of type=Computer and File=1)
df3 (having data of type=Phone and File=2)
df4 (having data of type=Computer and File=2)

The splitting should be done inside a loop.

I know we can split a dataframe based on one condition (shown below), but how do you split it based on two ?

My Code:

data = {'ID' : ['123', '122', '126'],'Type' :['Phone','Computer','Computer'],'File' : [1,2,1]}

df=pd.DataFrame(data) 

types=list(set(df['Type']))
total_splits=len(set(df['Type']))*len(set(df['File']))

cnt=1

for i in range(0,total_splits):
      for j in types:
            locals()["df"+str(cnt)] = df[df['Type'] == j]
            cnt += 1

The result of the above code gives 2 dataframes, df1 and df2. df1 will have data of Type='Phone' and df2 will have data of Type='Computer'.

But this is just half of what I want to do. Is there a way we can make 4 dataframes here based on 2 conditions ?

Note: I know I can first split on 'Type' and then split the resulting dataframe based on 'File' to get the output. However, I want to know of a more efficient way of performing the split instead of having to create multiple dataframes to get the job done.

EDIT

This is not a duplicate question as I want to split the dataframe based on multiple column values, not just one!

Possible duplicate of Splitting dataframe into multiple dataframes — M_S_N
– M_S_N, Commented Sep 10, 2019 at 19:05

Quang Hoang · Accepted Answer · 2019-09-10 19:06:52Z

1

You can make do with groupby:

dfs = {}
for k, d in df.groupby(['Type','File']):
    type, file = k

    # do want ever you want here
    # d is the dataframe corresponding with type, file
    dfs[k] = d

You can also create a mask:

df['mask'] = df['File'].eq(1) * 2 + df['Type'].eq('Phone')

Then, for example:

df[df['mask'].eq(0)]

gives you the first dataframe you want, i.e. Type==Phone and File==1, and so on.

answered Sep 10, 2019 at 19:06

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Gary Over a year ago

I used the groupby method but ended up with 8 dataframes with a lot of duplicate dataframes generated. Code I used is: cnt=1 for k, d in df.groupby(['File','Type']): type, file = k locals()["df"+str(cnt)] = d cnt += 1

Quang Hoang Over a year ago

That means you have more categories, e.g, 3 Types. You probably want to filter them out first.

Collectives™ on Stack Overflow

Splitting dataframe based on multiple column values

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related