want to create a new column based on a string column that have as separator(" ") and skip the split if a digit followed and finally delete ";" in the end if exist using python/pyspark :
Inputs :
"511 520 NA 611;"
"322 GA 620"
"3 321;"
"334344"
expected Output :
+Column | +new column
"511 520 NA 611;" | [511,520,NA 611]
"322 GA 620" | [322,GA 620]
"3 321; " | [3,321]
"334 344" | [334,344]
try :
data = data.withColumn(
"newcolumn",
split(col("column"), "\s"))
but i get an empty string at the end of the array like here and i want to delete it if exist
+Column | +new column
"511 520 NA 611;" | [511,520,NA,611;]
"322 GA 620" | [322,GA,620]
"3 321;" | [3,321;]
"334 344" | [334,344]