I am trying to add elements in a mutable scala list as below. I am reading the values from a dataframe row by row, extracting out values of column with name "_title" and adding it to list. But when for loop is complete, the list is still enmpty. this is the code:
import scala.collection.mutable.ListBuffer
val flatK = dfR.withColumn("UserValue", explode(col("UserValue")))
var colListA = new ListBuffer[String]()
// var colSet : List[String] = List()
for(i <- 0 until Integer.parseInt(dfR.count().toString)){
flatK.filter($"columnIndex" === i).foreach{
r=>
val columnName = r.getAs[Row]("UserValue").getAs[String]("_title")
// println(columnName)
colListA.append(columnName)
}
}
println(columnName) actually prints the value I want to put inside my list.
My dataframe dfR looks like this:
+--------------------------------------------------------------+-----------+
|UserValue |columnIndex|
+--------------------------------------------------------------+-----------+
|[, last_mod_date, 2009-01-14T13:40:53] |0 |
|[, object_string, SOLIDS] |0 |
|[, last_mod_date, 2009-01-13T22:58:30] |1 |
|[, object_string, TORSO] |1 |
When I do
colListA += "elements"
colListA += "adds"
I can see elements added. But not inside that foreach loop. Can any one tell me what shall I try? Basically, I expect colList to be populated with last_mod_date and object_string.
foreachis not executed in the driver (where your buffer exists) but on the executors (all of them had a local a copy of the buffer). At the end, each copy was modified but the results are not synced with the driver, thus the main buffer stays empty. This common novice mistake is done because a poorly understanding of Sparks architecture. I would recommend you to read a little bit about how spark works and what are they use cases.collectin order to have all values in your driver (which looks like what you want) - be warned that, if theDFis big, you may just blow your memory. Spark was intended for working with large amount of data that wont fit in one machine, but since you are already working in local mode, just for debugging then you are done.