string split multiple columns if true else [Python]

Question

I'm trying to make a transition from R to Python. One package that I heavily relied on was the data.table package. I am struggling to replicate this in Py/Pandas or just Python.

Update: included dummy data - thank you @cmaher for suggestion

import pandas
d = {'id': [1, 2, 3], 'x1': ['1_a', '1_b', 'NX']}
df = pd.DataFrame(data=d)
df

# R solution
library(data.table)
library(stringr)

df <- data.table(id = c(1,2,3), x1=c('1_a', '1_b', 'NX'))

df[str_detect(x1, '\\d') & !str_detect(x1, 'NX'), c("x2", "x3") := tstrsplit(x1, "_", fixed=TRUE)][!str_detect(x1, '\\d'), 'x3' := x1]

df
> df
   id  x1 x2 x3
1:  1 1_a  1  a
2:  2 1_b  1  b
3:  3  NX NA NX

# python-pandas attempt
df['x2'], df['x2'] = df['x1'].apply(
    lambda x: df['x1'].str.split('_', 1).str if (df['x1'].str.contains('\\d')) & 
    ~(df['x1'].str.contains('NX')) else df['x1'])

Please read how to make a good reproducible pandas examples. Questions such as this one are much more constructive if they include sample data & desired output, rather than just a code chunk to translate. — cmaher
– cmaher, Commented Mar 23, 2018 at 18:10
Do you want string to be separated by underscore or want to extract the number part of the string in x2 and string part in x3? — Vaishali
– Vaishali, Commented Mar 23, 2018 at 20:32
split by underscore mainly to do what you mentioned: x2 = number and x3 =string. — user2340706
– user2340706, Commented Mar 23, 2018 at 20:51

migjimen · Accepted Answer · 2018-03-23 22:38:16Z

1

As I see in your comments, your intend is to separate numbers in x2 and strings in x3. Maybe the next code fit your requirements, using the 're' package:

import pandas as pd
import re
d = {'id': [1, 2, 3], 'x1': ['1_a', '1_b', 'NX']}
df = pd.DataFrame(data=d)
print(df)

def findPattern(pattern, string):
    m= re.search(pattern,string)
    if m:
        return m.group()
    else:
        return None

df['x2'] = df.x1.apply(lambda x: findPattern(r"\d+",x)) 
df['x3'] = df.x1.apply(lambda x: findPattern(r"[a-zA-Z]+",x))

print(df)

The output:

   id   x1    x2  x3
0   1  1_a     1   a
1   2  1_b     1   b
2   3   NX  None  NX

answered Mar 23, 2018 at 22:38

migjimen

5711 gold badge4 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

It_is_Chris · Accepted Answer · 2018-03-23 22:19:10Z

1

So are you looking for something like this?

import pandas as pd
import numpy as np

df = pd.DataFrame({'id': [1, 2, 3], 'x1': ['1_a', '1_b', 'NX']})
df['x2'], df['x3'] = df['x1'].str.split('_', 1).str
df.loc[df['x3'].isnull(),'x3'] = df['x1']
df['x2'] = df['x2'].replace(df['x1'],np.nan)
df

out:

    id  x1  x2  x3
0   1   1_a 1   a
1   2   1_b 1   b
2   3   NX  NaN NX

edited Mar 23, 2018 at 22:19

answered Mar 23, 2018 at 19:53

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

2 Comments

user2340706 Over a year ago

Sorry the NA is R's equivalent to 'missing data'.

It_is_Chris Over a year ago

@user2340706 this should work for you. it separates each string in df[x1] on '_' the default for df['x3'] is df['x1'] df['x2'] is NULL if there is no _ on which to split.

Collectives™ on Stack Overflow

string split multiple columns if true else [Python]

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related