I have a DataFrame like this.
Name City Name_index City_index
Ali lhr 2.0 0.0
abc swl 0.0 2.0
xyz khi 1.0 1.0
I want to drop columns that don't contain string like "index".
Expected Output should be like:
Name_index City_index
2.0 0.0
0.0 2.0
1.0 1.0
I have tried this.
val cols = newDF.columns
val regex = """^((?!_indexed).)*$""".r
val selection = cols.filter(s => regex.findFirstIn(s).isDefined)
cols.diff(selection)
val res =newDF.select(selection.head, selection.tail : _*)
res.show()
But I am getting this:
Name City
Ali lhr
abc swl
xyz khi