I have a python class and it has functions like below:
class Features():
def __init__(self, json):
self.json = json
def get_email(self):
email = self.json.get('fields', {}).get('email', None)
return email
And I am trying to use the get_email function in a pyspark dataframe to create a new column based on another column, "raw_json",which consists of json value:
df = data.withColumn('email', (F.udf(lambda j: Features.get_email(json.loads(j)), t.StringType()))('raw_json'))
So the ideal pyspark dataframe looks like below:
+---------------+-----------
|raw_json |email
+----------------+----------
| |
+----------------+--------
| |
+----------------+-------
But I am getting an error saying:
TypeError: unbound method get_email() must be called with Features instance as first argument (got dict instance instead)
How should I do to achieve this?
I have seen a similar question asked before but it was not resolved.