python concat multiple columns if next column not in previous column

Question

I have a sample data like this:

col1                    col2        col3
PYTHON RD              APT 3         NaN
STACK AVE APT 2-3    APT 2-3         NaN
OVER ST 1/2         UNIT 1/2    UNIT 1/2
FLOW RD                  NaN         NaN

I want to create a new field:

col1                    col2        col3               COMBINED
PYTHON RD              APT 3         NaN        PYTHON RD APT 3
STACK AVE APT 2-3    APT 2-3         NaN      STACK AVE APT 2-3
OVER ST 1/2         UNIT 1/2    UNIT 1/2   OVER ST 1/2 UNIT 1/2
FLOW RD                  NaN         NaN                FLOW RD

I tried:

columns = ["col1", "col2", "col3"]
COMBINED = ''
for col in columns:
    df[col] = df[col].fillna("")
    COMBINED = COMBINED + df[col].str.strip() + ' '
    df['COMBINED'] = COMBINED.str.strip()

Above one can combined but with duplicated in second observations STACK AVE APT 2-3 APT 2-3.

Any suggestion?

Andrej Kesely · Accepted Answer · 2021-04-16 22:04:41Z

1

print(
    df[["col1", "col2"]]
    .fillna("")
    .apply(
        lambda x: x.loc["col1"]
        if x.loc["col2"] in x.loc["col1"]
        else x.loc["col1"] + " " + x.loc["col2"],
        axis=1,
    )
)

Prints:

                col1      col2              COMBINED
0          PYTHON RD     APT 3       PYTHON RD APT 3
1  STACK AVE APT 2-3   APT 2-3     STACK AVE APT 2-3
2        OVER ST 1/2  UNIT 1/2  OVER ST 1/2 UNIT 1/2
3            FLOW RD       NaN               FLOW RD

EDIT: For many columns:

def combine(x):
    out = []
    for word in x:
        if word and not any(word in w for w in out):
            out.append(word)
    return " ".join(out)


columns = ["col1", "col2", "col3"]
df["COMBINED"] = df[columns].fillna("").apply(combine, axis=1)
print(df)

Prints:

                col1      col2      col3              COMBINED
0          PYTHON RD     APT 3       NaN       PYTHON RD APT 3
1  STACK AVE APT 2-3   APT 2-3       NaN     STACK AVE APT 2-3
2        OVER ST 1/2  UNIT 1/2  UNIT 1/2  OVER ST 1/2 UNIT 1/2
3            FLOW RD       NaN       NaN               FLOW RD

edited Apr 16, 2021 at 22:04

answered Apr 16, 2021 at 21:47

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Peter Chen Over a year ago

if I have more than 2 cols, like col1, col2, col3. It is difficult to modify the code each time.

Alex G · Accepted Answer · 2021-04-16 22:02:39Z

1

Not sure if this covers all your caes:

def combine(row):
    row = row.fillna("")
    result = row["col1"]
    for col in ["col2", "col3"]:
        if not row[col] in result:
            result += " " + row[col]
    return result
    
df["COMBINED"] = df.apply(combine, axis=1)

answered Apr 16, 2021 at 22:02

Alex G

6583 silver badges13 bronze badges

Comments

wwnde · Accepted Answer · 2021-04-16 22:22:47Z

1

Lets try play with unique and join

df['col4']=df.fillna('').apply(lambda X:",".join(X.unique()).strip('\,$'),axis=1)


     

            col1       col2         col3                  col4
0          PYTHON RD     APT 3       NaN            PYTHON RD,APT 3
1  STACK AVE APT 2-3   APT 2-3       NaN  STACK AVE APT 2-3,APT 2-3
2        OVER ST 1/2  UNIT 1/2  UNIT 1/2       OVER ST 1/2,UNIT 1/2
3            FLOW RD       NaN       NaN                    FLOW RD

answered Apr 16, 2021 at 22:22

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Collectives™ on Stack Overflow

python concat multiple columns if next column not in previous column

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related