1

I have a dataframe with two columns which contain numbers and i need to sort the dataframe row wise and not column wise. Everywhere it is given how to sort a dataframe by column but nowhere I can find how to sort all the rows of dataframe in pyspark

col1    col2

2        1

3        2

Expected output

col1   col2

1       2

2       3
1
  • You may need to provide sample data along with the desired result. This link will help you update/ refine your question. Commented Nov 14, 2017 at 5:37

1 Answer 1

1

You may need some workaround to produce your desired result.

Here is an example to sort data based on a row.

From your dataframe, you may need create an index first.

df = spark.createDataFrame([['index1',3,2,1], ['index2',2,1,3]], ['index', 'a', 'b', 'c']) 
columns = [i for i in df.columns if i != 'index'] 
df.show()

enter image description here

def sort_row_df(row_to_sort):
    row_data = df.filter(col('index')==row_to_sort).collect()[0] 

    sorted_row = sorted([[row_data[col_], col_] for col_ in columns])

    rearrange_col = [i[1] for i in sorted_row]

    return df.select("index", *rearrange_col)   

Lets say you wish to sort based on row 'index1',

row_to_sort = 'index1'
sorted_df = sort_row_df(row_to_sort)
sorted_df.show()

enter image description here

To sort based on row 'index2',

row_to_sort = 'index2'
sorted_df = sort_row_df(row_to_sort)
sorted_df.show()

enter image description here

If you want to sort all data based on rows, i would suggest you just to transpose all the data, sorts it, and transpose it back again. You may refer on how to transpose df in pyspark.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.