I'm new to pyspark. I want to add a new column with multiple values and the partition with those values.
import math
coun=df.count()
if(coun<= 20000):
chunksize=2
rowsperchunk = math.ceil(coun/2)
else:
chunksize= math.ceil(coun/20000)
rowsperchunk = 20000
for i in chunksize:
df.limit(num_rows_per_chunk).withColumn('chunk',F.lit(i))
in for loop above it will only insert 1 value till limit
example: i have 100k rows in my data frame so chunk size will be 5. and rows per chunk is 20 000 so i need add new column first 20 000 rows need to be inserted with value 1 and the next 20 000 rows needs to be inserted with value 2. till the end of chunksize. then i want to partition based on the new column we created