0

I have the following code:

L = {'L1': ['us'] }
#df1 = df1.withColumnRenamed("name","OriginalCompanyName")
for key, vals in L.items():
    # regex pattern for extracting vals
    pat = r'\\b(%s)\\b' % '|'.join(vals)

    # extract matching occurrences
    col1 = F.expr("regexp_extract_all(array_join(loc, ' '), '%s')" % pat)

    # Mask the rows with null when there are no matches
    df1 = df1.withColumn(key, F.when((F.size(col1) == 0), None).otherwise(col1))

it is extracting us from the column loc and key column is us and null otherwise. I have also some empty list [] in the column loc. I want to also put us in the column key when loc is empty. If I change L = {'L1': ['us'] } to L = {'L1': ['us','[]' } it doesn't work.

For some reason this code actually eliminates rows when loc is empty. Can I modify the code?

Hint: empty loc can be found by the following code:

df1=df1.withColumn('empty_country', when(sf.size('loc')==0,'us'))

data sample

loc
["this is ,us, better life"]
["no one is, in charge"]
["I am, very far, from us"]
[]


loc
["this is ,us, better life"]      ["us"]
["no one is, in charge"]           null
["I am, very far, from us"]        ["us"]
[]                                 ["us"]

1 Answer 1

1

Make this change to the last line in the for loop:

df1 = df1.withColumn(key, f.when((f.size(col1) == 0) & (f.size('loc')!=0), None).when(f.size('loc')==0, f.array(f.lit('us'))).otherwise(col1))

PS: The output of regexp_extract_all is an array.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.