1

I have a sample data like this:

col1                    col2        col3
PYTHON RD              APT 3         NaN
STACK AVE APT 2-3    APT 2-3         NaN
OVER ST 1/2         UNIT 1/2    UNIT 1/2
FLOW RD                  NaN         NaN

I want to create a new field:

col1                    col2        col3               COMBINED
PYTHON RD              APT 3         NaN        PYTHON RD APT 3
STACK AVE APT 2-3    APT 2-3         NaN      STACK AVE APT 2-3
OVER ST 1/2         UNIT 1/2    UNIT 1/2   OVER ST 1/2 UNIT 1/2
FLOW RD                  NaN         NaN                FLOW RD

I tried:

columns = ["col1", "col2", "col3"]
COMBINED = ''
for col in columns:
    df[col] = df[col].fillna("")
    COMBINED = COMBINED + df[col].str.strip() + ' '
    df['COMBINED'] = COMBINED.str.strip()

Above one can combined but with duplicated in second observations STACK AVE APT 2-3 APT 2-3.

Any suggestion?

3 Answers 3

1
print(
    df[["col1", "col2"]]
    .fillna("")
    .apply(
        lambda x: x.loc["col1"]
        if x.loc["col2"] in x.loc["col1"]
        else x.loc["col1"] + " " + x.loc["col2"],
        axis=1,
    )
)

Prints:

                col1      col2              COMBINED
0          PYTHON RD     APT 3       PYTHON RD APT 3
1  STACK AVE APT 2-3   APT 2-3     STACK AVE APT 2-3
2        OVER ST 1/2  UNIT 1/2  OVER ST 1/2 UNIT 1/2
3            FLOW RD       NaN               FLOW RD

EDIT: For many columns:

def combine(x):
    out = []
    for word in x:
        if word and not any(word in w for w in out):
            out.append(word)
    return " ".join(out)


columns = ["col1", "col2", "col3"]
df["COMBINED"] = df[columns].fillna("").apply(combine, axis=1)
print(df)

Prints:

                col1      col2      col3              COMBINED
0          PYTHON RD     APT 3       NaN       PYTHON RD APT 3
1  STACK AVE APT 2-3   APT 2-3       NaN     STACK AVE APT 2-3
2        OVER ST 1/2  UNIT 1/2  UNIT 1/2  OVER ST 1/2 UNIT 1/2
3            FLOW RD       NaN       NaN               FLOW RD
Sign up to request clarification or add additional context in comments.

1 Comment

if I have more than 2 cols, like col1, col2, col3. It is difficult to modify the code each time.
1

Not sure if this covers all your caes:

def combine(row):
    row = row.fillna("")
    result = row["col1"]
    for col in ["col2", "col3"]:
        if not row[col] in result:
            result += " " + row[col]
    return result
    
df["COMBINED"] = df.apply(combine, axis=1)

Comments

1

Lets try play with unique and join

df['col4']=df.fillna('').apply(lambda X:",".join(X.unique()).strip('\,$'),axis=1)


     

            col1       col2         col3                  col4
0          PYTHON RD     APT 3       NaN            PYTHON RD,APT 3
1  STACK AVE APT 2-3   APT 2-3       NaN  STACK AVE APT 2-3,APT 2-3
2        OVER ST 1/2  UNIT 1/2  UNIT 1/2       OVER ST 1/2,UNIT 1/2
3            FLOW RD       NaN       NaN                    FLOW RD

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.