1

Does anyone know a good way in Scala to explode a row into multiple rows based on a range from two columns?

For example, for the input dataframe:

start_ip_int | end_ip_int | country | city
100          | 105        | USA     | Boston

The expected output dataframe is:

start_ip_int | end_ip_int | country | city   | ip
100          | 105        | USA     | Boston | 100
100          | 105        | USA     | Boston | 101
100          | 105        | USA     | Boston | 102
100          | 105        | USA     | Boston | 103
100          | 105        | USA     | Boston | 104
100          | 105        | USA     | Boston | 105

So here one row got split into 6 rows based on the range of columns start_ip_int and end_ip_int.

1 Answer 1

4

If you're on Spark 2.4+, use sequence with the IP integer range as arguments to generate an ArrayType column, followed by explode-ing it:

val df = Seq((100, 105, "USA", "Boston")).
  toDF("start_ip_int", "end_ip_int", "country", "city")

df.withColumn("ip", explode(sequence($"start_ip_int", $"end_ip_int"))).show
// +------------+----------+-------+------+---+                                    
// |start_ip_int|end_ip_int|country|  city| ip|
// +------------+----------+-------+------+---+
// |         100|       105|    USA|Boston|100|
// |         100|       105|    USA|Boston|101|
// |         100|       105|    USA|Boston|102|
// |         100|       105|    USA|Boston|103|
// |         100|       105|    USA|Boston|104|
// |         100|       105|    USA|Boston|105|
// +------------+----------+-------+------+---+

For older Spark version, consider creating a simple UDF to mimic the sequence function:

val rangeSequence = udf{ (lower: Int, upper: Int) =>
  Seq.iterate(lower, upper - lower + 1)(_ + 1)
}

// Applying the UDF, followed by `explode`
df.withColumn("ip", explode(rangeSequence($"start_ip_int", $"end_ip_int")))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.