0

I have a pySpark dataframe and want to make a several sub dataframes using groupBy operation. For example, I have a DF like

       subject  relation object 
DF =      s1       p       o1
          s2       p       o2
          s3       q       o3
          s4       q       o4

and want to have a sub dataframes with same relation names like

       subject  relation object 
DF1 =      s1       p       o1
           s2       p       o2
       subject  relation object 
DF2 =      s3       q       o3
           s4       q       o4

I would be appreciated if you can share your idea how to make sub dataframes using groupBy().

Thanks

0

1 Answer 1

0

You can groupy and the create a list like this

df_groupby = DF.groupby('relation')

df_list = []
for row in df_groupby.select('relation').distinct().sort('relation').collect(): 
    current_relation = row['relation']
    df_list.append(df_groupby.filter(df_groupby['relation'] == current_relation))
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.