Python pandas replace based on partial match with list item

Question

I have a large three-column dataframe of this form:

Ref    Colourref      Shaperef      
5      red 12         square 15
9      14 blue        (circle14,2)  
10     6 orange 12    18 square
12     pink1,7        [oval] [40]
14     [green]        (rectsq#12,6)
...

And a long list with entries like this:

li = [
    'oval 60 [oval] [40]', 
    '(circle14,2) circ', 
    'square 20', 
    '126 18 square 921#',
]

I want to replace the entries in the Shaperef column of the df with a value from the list if the full Shaperef string matches any part of any list item. If there is no match, the entry is not changed.

Desired output:

Ref    Colourref      Shaperef      
5      red 12         square 15
9      14 blue        (circle14,2) circ  
10     6 orange 12    126 18 square 921#
12     pink1,7        oval 60 [oval] [40]
14     [green]        (rectsq#12,6)
...

So refs 9, 10, 12 are updated as there is a partial match with a list item. Refs 5, 14 stay as there are.

Alex · Accepted Answer · 2021-05-16 15:39:58Z

1

If Shaperef and all the entries in li are all strings you can write a function to apply over Shaperef to convert them:

def f(row_val, seq):
    for item in seq:
        if row_val in item:
            return item
    return row_val

Then:

# read in your example
import pandas as pd
from io import StringIO

s = """Ref    Colourref      Shaperef      
5      red 12         square 15
9      14 blue        (circle14,2)  
10     6 orange 12    18 square
12     pink1,7        [oval] [40]
14     [green]        (rectsq#12,6)
"""
li = [
    "oval 60 [oval] [40]",
    "(circle14,2) circ",
    "square 20",
    "126 18 square 921#",
]
df = pd.read_csv(StringIO(s), sep=r"\s\s+", engine="python")

# Apply the function here:
df["Shaperef"] = df["Shaperef"].apply(lambda v: f(v, li))
#    Ref    Colourref             Shaperef
# 0    5       red 12            square 15
# 1    9      14 blue    (circle14,2) circ
# 2   10  6 orange 12   126 18 square 921#
# 3   12      pink1,7  oval 60 [oval] [40]
# 4   14      [green]        (rectsq#12,6)

This might not be a very quick way of doing this as it has a worst case run time of len(df) * len(li).

edited May 16, 2021 at 15:39

answered May 16, 2021 at 12:32

Alex

7,1654 gold badges27 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

lordmurf Over a year ago

Thanks. This words great with my example data, but when applied to the actual df I get this: "initial_value must be str or None, not DataFrame". Any idea what might be causing that?

Alex Over a year ago

That's a StringIO error. What is the line that's causing this error?

lordmurf Over a year ago

df2 = pd.read_csv(StringIO(df), sep=r'\s\s+', engine='python') where df is the original df table

Alex Over a year ago

You can't create a new DataFrame like that. That part of the example was only for reading in your example data. The only lines you should need are df["Shaperef"] = df["Shaperef"].apply(lambda v: f(v, li)) and the function f

Alex Over a year ago

The final line of the function f, return row_val just says if there wasn't a match, don't change it. You can change that to be return row_val + ", !!NO MATCH!!" and that should do it.

|

Collectives™ on Stack Overflow

Python pandas replace based on partial match with list item

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related