0

Spark convert half of row data into next row

I have a csv file where each line has even number of words separated by comma. I want to read the csv file and put half data from each row to next row using spark.

Example

Input
a,b,c,d,e,f,g,h
1,2,3,4,5,6,7,8

Output

a,b,c,d
e,f,g,h
1,2,3,4
5,6,7,8

One solution is select four column and union with next four select columns. But dont want to use this approach.

5
  • you might be able to do this by enforcing a schema Commented Jun 10 at 3:10
  • This question is similar to: How to Split the row by nth delimiter in Spark Scala. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Jun 10 at 4:18
  • half ? what about the case you'll have an odd number of elements ? Commented Jun 10 at 8:02
  • @philantrovert the solution mentioned there is using union Commented Jun 10 at 16:02
  • @Steven The line always has even number of elements Commented Jun 10 at 16:03

1 Answer 1

1

According to your requirements,to split each dataset row into two in Spark,flatMap transforms one row into two in a single pass, much faster than merging later. Just load your data, apply a simple function to split rows, and flatMap handles the rest. Then, convert the result back to a DataFrame for further use.

Below is the code snippets:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("SplitRows").getOrCreate()
data = [("a,b,c,d,e,f,g,h",), ("1,2,3,4,5,6,7,8,7,9",)]

df = spark.createDataFrame(data, ["value"])
df.show()

def  split_row(row):
parts = row.value.split(',')
midpoint =  len(parts) //  2
return [(",".join(parts[:midpoint]),), (",".join(parts[midpoint:]),)]

split_rdd = df.rdd.flatMap(split_row)
result_df = spark.createDataFrame(split_rdd)
result_df.show()

Output:
enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.