I have the following dataset in pyspark
| Id | Sub |
|---|---|
| 1 | Mat |
| 1 | Phy |
| 1 | Sci |
| 2 | Bio |
| 2 | Phy |
| 2 | Sci |
I want to create a df similar to the one below
| Id | Sub | HaMat |
|---|---|---|
| 1 | Mat | 1 |
| 1 | Phy | 1 |
| 1 | Sci | 1 |
| 2 | Bio | 0 |
| 2 | Phy | 0 |
| 2 | Sci | 0 |
How do I do this in pyspark ?
def hasMath(studentID,df):
return df.filter(col('Id') == studentID & col('sub') = 'Mat' ).count()
df = df.withColumn("hasMath",hasMath(F.col('id'),df1))
But this doesn't seem to work. IS there a better way to achieve this.