I have a pandas implementation of this question here. I want to implement this using pyspark for spark environment.
I have 2 csv files. first csv has keyword and corresponding lookipid column. I converted this into 2 lists in pure python.
keyword = ['IT Manager', 'Sales Manager', 'IT Analyst', 'Store Manager']
lookupid = ['##10##','##13##','##12##','##13##']
Second csv file has a title column with sample data below
current_title
I have been working here as a store manager since after I passed from college
I am sales manager and primarily work in the ASEAN region. My primary rolw is to bring new customers.
I initially joined as a IT analyst and because of my sheer drive and dedication, I was promoted to IT manager position within 3 years
I want to do find and replace using regular expression as well and return below output
current_title
I have been working here as a ##13## since after I passed from college
I am ##13## and primarily work in the ASEAN region. My primary rolw is to bring new customers.
I initially joined as a ##12## and because of my sheer drive and dedication, I was promoted to ##10## position within 3 years
How to do this using pyspark? Please suggest
UDFin pyspark dataframe to replace all the values usingregex