pySpark sub dataframes using groupBy() [duplicate]

Question

I have a pySpark dataframe and want to make a several sub dataframes using groupBy operation. For example, I have a DF like

       subject  relation object 
DF =      s1       p       o1
          s2       p       o2
          s3       q       o3
          s4       q       o4

and want to have a sub dataframes with same relation names like

       subject  relation object 
DF1 =      s1       p       o1
           s2       p       o2
       subject  relation object 
DF2 =      s3       q       o3
           s4       q       o4

I would be appreciated if you can share your idea how to make sub dataframes using groupBy().

Thanks

SchwarzeHuhn · Accepted Answer · 2019-12-25 09:07:52Z

0

You can groupy and the create a list like this

df_groupby = DF.groupby('relation')

df_list = []
for row in df_groupby.select('relation').distinct().sort('relation').collect(): 
    current_relation = row['relation']
    df_list.append(df_groupby.filter(df_groupby['relation'] == current_relation))

answered Dec 25, 2019 at 9:07

SchwarzeHuhn

6485 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

pySpark sub dataframes using groupBy() [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related