0

I am trying to create a function as the following to add two org.apache.spark.ml.linalg.Vector. or i.e two sparse vectors

This vector could look as the following

(28,[1,2,3,4,7,11,12,13,14,15,17,20,22,23,24,25],[0.13028398104008743,0.23648605632753023,0.7094581689825907,0.13028398104008743,0.23648605632753023,0.0,0.14218861229025295,0.3580566057240087,0.14218861229025295,0.13028398104008743,0.26056796208017485,0.0,0.14218861229025295,0.06514199052004371,0.13028398104008743,0.23648605632753023])

For e.g.

def add_vectors(x: org.apache.spark.ml.linalg.Vector,y:org.apache.spark.ml.linalg.Vector): org.apache.spark.ml.linalg.Vector = {
      
    }

Let's look at a use case

val x = Vectors.sparse(2, List(0), List(1)) // [1, 0]
val y = Vectors.sparse(2, List(1), List(1)) // [0, 1]

I want to output to be 

Vectors.sparse(2, List(0,1), List(1,1)) 

Here's another case where they share the same indices

val x = Vectors.sparse(2, List(1), List(1))
val y = Vectors.sparse(2, List(1), List(1)) 

This output should be

Vectors.sparse(2, List(1), List(2)) 

I've realized doing this is harder than it seems. I looked into one possible solution of converting the vectors into breeze, adding them in breeze and then converting it back to a vector. e.g Addition of two RDD[mllib.linalg.Vector]'s. So I tried implementing this.

def add_vectors(x: org.apache.spark.ml.linalg.Vector,y:org.apache.spark.ml.linalg.Vector) ={

   val dense_x = x.toDense
   val dense_y = y.toDense

  val bv1 = new DenseVector(dense_x.toArray)
  val bv2 = new DenseVector(dense_y.toArray)

  val vectout = Vectors.dense((bv1 + bv2).toArray)
  vectout
}

however this gave me an error in the last line

val vectout = Vectors.dense((bv1 + bv2).toArray)

Cannot resolve the overloaded method 'dense'. I'm wondering why is error is occurring and ways to fix it?

1 Answer 1

0

To answer my own question, I had to think about how sparse vectors are. For e.g. Sparse Vectors require 3 arguments. the number of dimensions, an array of indices, and finally an array of values. For e.g.

val indices: Array[Int] = Array(1,2)
      val norms: Array[Double] = Array(0.5,0.3)
      val num_int = 4
      val vector: Vector = Vectors.sparse(num_int, indices, norms)

If I converted this SparseVector to an Array I would get the following.

code:

 val choiced_array = vector.toArray

 choiced_array.map(element => print(element + " "))

Output:

   [0.0, 0.5,0.3,0.0].

This is considered a more dense representation of it. So once you convert the two vectors to array you can add them with the following code

val add: Array[Double] = (vector.toArray, vector_2.toArray).zipped.map(_ + _)

This gives you another array of them both added. Next to create your new sparse vector, you would want to create an indices array as shown in the construction

 var i = -1;
  val new_indices_pre = add.map( (element:Double) => {
    i = i + 1
    if(element > 0.0)
      i
    else{
      -1
    }
  })

Then lets filter out all -1 indices indication that indicate zero for that indice.

new_indices_pre.filter(element => element != -1)

Remember to filter out none zero values from the array which has the addition of the two vectors.

val final_add = add.filter(element => element > 0.0)

Lastly, we can make the new sparse Vector

Vectors.sparse(num_int,new_indices,final_add)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.