I am using a vector assembler to transform a dataframe.
var stringAssembler = new VectorAssembler().setInputCols(encodedstringColumns).setOutputCol("stringFeatures")
df = stringAssembler.transform(df)
**var stringVectorSize = df.select("stringFeatures").head.size**
var stringPca = new PCA().setInputCol("stringFeatures").setOutputCol("pcaStringFeatures").setK(stringVectorSize).fit(output)
Now stringVectorSize will tell PCA how many columns to keep while performing pca. I am trying to get the size of the output sparse vector from the vector assembler but my code gives size = 1 which is wrong. What is the right code to get the size of a sparse vector which is the part of a dataframe column.
To put it plainly
+-------------+------------+-------------+------------+---+-----------+---------------+-----------------+--------------------+
|PetalLengthCm|PetalWidthCm|SepalLengthCm|SepalWidthCm| Id| Species|Species_Encoded| Id_Encoded| stringFeatures|
+-------------+------------+-------------+------------+---+-----------+---------------+-----------------+--------------------+
| 1.4| 0.2| 5.1| 3.5| 1|Iris-setosa| (2,[0],[1.0])| (149,[91],[1.0])|(151,[91,149],[1....|
| 1.4| 0.2| 4.9| 3.0| 2|Iris-setosa| (2,[0],[1.0])|(149,[119],[1.0])|(151,[119,149],[1...|
| 1.3| 0.2| 4.7| 3.2| 3|Iris-setosa| (2,[0],[1.0])|(149,[140],[1.0])|(151,[140,149],[1...|
For the above dataframe . I want to extract the size of stringFeatures sparse vector ( which is 151)