1

I am facing a problem when I try to assemble a vector form a dataframe (Some columns contain null values) in scala. Unfortunately vectorAssembler cannot handle null values.

What I can do is to replace or fill dataframe's null values and then create a dense vector but that is not what I want.

So I thought about converting my dataframe rows to a sparse vector. But how can I achive this? I have not found an option for the vectorAssembler to make a sparse vector.

EDIT: Actually I do not need null in the sparse vector but it shouldn't be a value like 0 or any other as it would be the case for a dense vector.

Do you have any suggestions?

0

1 Answer 1

1

You could do it manually like this:

import org.apache.spark.SparkException
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.sql.SparkSession
import scala.collection.mutable.ArrayBuilder

case class Row(a: Double, b: Option[Double], c: Double, d: Vector, e: Double)

val dataset = spark.createDataFrame(
  Seq(new Row(0, None, 3.0, Vectors.dense(4.0, 5.0, 0.5), 7.0),
    new Row(1, Some(2.0), 3.0, Vectors.dense(4.0, 5.0, 0.5), 7.0))
).toDF("id", "hour", "mobile", "userFeatures", "clicked")

val sparseVectorRDD = dataset.rdd.map { row =>
  val indices = ArrayBuilder.make[Int]
  val values = ArrayBuilder.make[Double]
  var cur = 0
  row.toSeq.foreach {
    case v: Double =>
      indices += cur
      values += v
      cur += 1
    case vec: Vector =>
      vec.foreachActive { case (i, v) =>
        indices += cur + i
        values += v
      }
      cur += vec.size
    case null =>
      cur += 1
    case o =>
      throw new SparkException(s"$o of type ${o.getClass.getName} is not supported.")
  }
  Vectors.sparse(cur, indices.result(), values.result())
}

And then convert it back to a dataframe if needed. Since Row objects are not type checked, you have to handle it manually and cast to the appropriate type if needed.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.