I am very new to Pyspark. So I have one requirement in which I have to get one column say 'id' from one MYSQL table and for each id, I need to get 'HOST' value which is column in another MYSQL table. So 1st part I have completed and I am getting id by using below piece of code.
criteria_df = read_data_from_table(criteria_tbl)
datasource_df = read_data_from_table(data_source_tbl)
import pyspark.sql.functions as F
for row in criteria_df.collect():
account_id = row["account_id"]
criteria_name = row["criteria"]
datasource_df = datasource_df.select(F.col('host')).where(F.col('id') == account_id)
datasource_df.show()
But when I am trying to get host value for each id, I am not getting any value.