Let's say, I have the following simple Spark Dataframe.
threshold = ?
ID percentage
B101 0.3
B101 0.3
B202 0.18
B303 0.25
As you can see above, I have to get the threshold value based the ID column. for example, if ID == B101, the threshold value becomes threshold = 0.3. if ID = B202, then the threshold get a new value and becomes threshold = 0.18. The same logic works for the rest. Like this, I have thousands of value and I would like to do this in a simple way.
I tried this:
threshold = df.first()['ID']
But I think there should be a loop to go over all the values.
Can anyone help with this in Pyspark?
thresholdvalue when one particular ID is mentioned in the data frame. So, every time for each particular ID, the threshold value has only one value.