How can I add column with sequence value from a specific number in PySpark data frame?
Current Dataset:
Col1 Col2 Flag
Val1 Val2 F
Val3 Val4 T
But I want the data set to be like this:
Col1 Col2 Flag New_Col
Val1 Val2 F 11F
Val3 Val4 T 12T
I'm using the below code, in Python.
from pyspark.sql import functions as F
from pyspark.sql import types as T
seq = 10
def fn_increment_id(flag):
global seq
seq += 1
return str(seq) + flag
if __name__ == "__main__":
df = spark.loadFromMapRDB("path/to/table")
my_udf = F.UserDefinedFunction(fn_increment_id, T.StringType())
df = df.withColumn("New_Col", my_udf("Flag"))
print(df.show(10))
But, I ends up with the result:
Received Dataset:
Col1 Col2 Flag New_Col
Val1 Val2 F 11F
Val3 Val4 T 11T
So, it incremented by once for all rows. How can I increment for each row? Thanks in advance.