Creating another column in pandas based on a pre-existing column

Question

I have a third column in my data frame where I want to be able to create a fourth column that looks almost the same, except it has no double quotes and there is a 'user/' prefix before each ID in the list. Also, sometimes it is just a single ID vs. list of IDs (as shown in example DF).

original

col1   col2     col3 
01      01     "ID278, ID289"

02      02     "ID275"

desired

col1   col2     col3                col4
01      01     "ID278, ID289"     user/ID278, user/ID289

02      02     "ID275"            user/ID275

I added a method that deals with rows that are empty, either filled with np.nan or empty strings ''. — BeRT2me
– BeRT2me, Commented Jul 19, 2022 at 5:07

BeRT2me · Accepted Answer · 2022-07-19 05:04:54Z

1

Given:

   col1  col2            col3
0   1.0   1.0  "ID278, ID289"
1   2.0   2.0         "ID275"
2   2.0   1.0             NaN

Doing:

df['col4'] = (df.col3.str.strip('"')  # Remove " from both ends.
                     .str.split(', ') # Split into lists on ', '.
                     .apply(lambda x: ['user/' + i for i in x if i] # Apply this list comprehension,
                                       if isinstance(x, list)  # If it's a list.
                                       else x)
                     .str.join(', ')) # Join them back together.
print(df)

Output:

   col1  col2            col3                    col4
0   1.0   1.0  "ID278, ID289"  user/ID278, user/ID289
1   2.0   2.0         "ID275"              user/ID275
2   2.0   1.0             NaN                     NaN

edited Jul 19, 2022 at 5:04

answered Jul 19, 2022 at 4:52

BeRT2me

13.3k2 gold badges18 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Haoliang · Accepted Answer · 2022-07-19 02:59:55Z

0

Here's a solution that takes both the double quotes and ID lists into account:

# remove the double quotes
df['col4'] = df['col3'].str.strip('"')
# split the string, add prefix user/, and then join
df['col4'] = df['col4'].apply(lambda x: ', '.join(f"user/{userId}" for userId in x.split(', ')))

answered Jul 19, 2022 at 2:59

Haoliang

1,2946 silver badges12 bronze badges

1 Comment

youtube Over a year ago

I also forgot to mention there are some rows that are empty in col 3, would you please revise the code for this aspect?

Ales19 · Accepted Answer · 2022-07-19 04:46:46Z

0

You can use .apply() function:

def function(x):
    if not x:
        return ""
    
    elements = x.split(", ")
    out = list()
    
    for i in elements:
        out.append(f"user/{i}")
        
    return ", ".join(out)

df["col4"] = df.col3.apply(function)

That returns:

col1  col2  col3          col4
1     1     ID278, ID289  user/ID278, user/ID289
2     2     ID275         user/ID275
3     3

edited Jul 19, 2022 at 4:46

answered Jul 19, 2022 at 2:42

Ales19

12 bronze badges

2 Comments

youtube Over a year ago

I also forgot to mention there are some rows that are empty in col 3, would you please revise the code for this aspect?

Ales19 Over a year ago

I updated the function, it will make that the empty rows return a empty string for the new column

Collectives™ on Stack Overflow

Creating another column in pandas based on a pre-existing column

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related