0

I have two datasets. First dataset includes all raw values that must be replaced with acceptable values that are given in the second dataset. If matching acceptable value is not found in second dataset, then leave it its own way.

First looks like this:

SOURCE_ID TITLE
1 Emaar Beachfront
2 EmaarBeachfront
3 emaar beachfront
4 dubai hills estate
5 Dubai Hills
6 Nad Al Sheba
7 Nadalsheba
8 dubai hills residences
9 The Cove Ru
10 Homes

Second looks like this:

ID TITLE
1 Emaar Beachfront
2 Dubai Hills
3 Nad Al Sheba
4 The Cove

So that in the end my dataset looks like this:

SOURCE_ID TITLE
1 Emaar Beachfront
2 Emaar Beachfront
3 Emaar Beachfront
4 Dubai Hills
5 Dubai Hills
6 Nad Al Sheba
7 Nad Al Sheba
8 Dubai Hills
9 The Cove
10 Homes

I thought it is possible via regex, but i am not sure

1 Answer 1

1

One solution could be this:

first = ["Emaar Beachfront",
"EmaarBeachfront",
"emaar beachfront",
"dubai hills estate",
"Dubai Hills",
"Nad Al Sheba",
"Nadalsheba",
"dubai hills residences",
"The Cove Ru",
"Homes"]

second = [
"Emaar Beachfront",
"Dubai Hills",
"Nad Al Sheba",
"The Cove"
]

second_transformed = [item.replace(" ", "").lower() for item in second]

out = []

for item in first:
    item_transformed = item.replace(" ", "").lower()
    item_found = False
    for second_item, second_item_transformed in zip(second, second_transformed):
        if second_item_transformed in item_transformed:
            out.append(second_item)
            item_found = True
            break
    if not item_found:
        out.append(item)

print(out)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.