I have a PySpark dataframe with one of the columns (features) being a sparse vector. For example:
+------------------+-----+
| features |label|
+------------------+-----+
| (4823,[87],[0.0])| 0.0|
| (4823,[31],[2.0])| 0.0|
|(4823,[159],[0.0])| 1.0|
| (4823,[1],[7.0])| 0.0|
|(4823,[15],[27.0])| 0.0|
+------------------+-----+
I would like to expand the features column and to add another feature to it, for example:
+-------------------+-----+
| features |label|
+-------------------+-----+
| (4824,[87],[0.0]) | 0.0|
| (4824,[31],[2.0]) | 0.0|
|(4824,[159],[0.0]) | 1.0|
| (4824,[1],[7.0]) | 0.0|
|(4824,[4824],[7.0])| 0.0|
+-------------------+-----+
Is there a way to do this without unpacking the SparseVector to dense and then repacking it to sparse with the new column?