I have a python class and it has functions like below:
class Features():
def __init__(self, json):
self.json = json
def email_name_match(self,name):
#some code
return result
The pyspark dataframe I have for now looks like below:
+---------------+-----------
|raw_json |firstName
+----------------+----------
| |
+----------------+--------
| |
+----------------+-------
And I am trying to use the email_name_match function in a pyspark dataframe to create a new column. So the "raw_json" column should be passed to initialize an object of Features and "firstName" should be passed as the "name" parameter of the method "email_name_match".
I did the following:
email_name_match_udf = F.udf(lambda j: NationalRetailFeatures(json.loads(j)).email_name_match())
df = avtk_gold.withColumn('firstname_email_match', F.udf(lambda j: NationalRetailFeatures(json.loads(j)).email_name_match(col("firstName")))("raw_json"))
But it's not working. It shows error:
AttributeError: 'NoneType' object has no attribute '_jvm
What should I do for this? The ideal dataframe looks like this:
+---------------+-------------+-----------
|raw_json |firstName | name_email_match
+----------------+------------------------
| | |
+----------------+-------------------------
| | |
+----------------+------------------------