2

hello I have a df such as

COL1 
NW_011625257.1_0
NW_011623521.1_1
NW_011623521.3_1
NW_011623521.4_1
NW_011623521.1
JZSA01007324.1_2
scaffold_1463_2
scaffold_1463

and I would like to cut by the last '_' and get

COL1              COL2
NW_011625257.1    0
NW_011623521.1    1
NW_011623521.3    1
NW_011623521.4    1
NW_011623521.1    NaN 
JZSA01007324.1    2
scaffold_1463     2
scaffold_1463     NaN

So far i tried :

df[['COL1','COL2']] = df.COL1.str.split(r'_(?!.*_)', expand=True)

instead I get this kind of output:

COL1             COL2
NW_011625257.1   0
NW_011623521.1   1
NW_011623521.3   1
NW_011623521.4   1
NW               011623521.1
JZSA01007324.1   2
scaffold_1463    2
scaffold         1463

here is an exemple of what I want to select

enter image description here

18
  • So what is the problem? NaN? What do you expect? Commented Oct 5, 2020 at 16:01
  • @WiktorStribiżew see the last row, it cuts at scaffold_1463 -> scaffold 1463 instead of adding a NaN Commented Oct 5, 2020 at 16:03
  • Try r"(?<=\.\d)_(?!.*_)" Commented Oct 5, 2020 at 16:06
  • @WiktorStribiżew it only match the _0 see here regex101.com/r/BP3Jnj/1 Commented Oct 5, 2020 at 16:08
  • 2
    I found ! (?<=[0-9])_works Commented Oct 5, 2020 at 16:43

1 Answer 1

2

You can use

df[['COL1','COL2']] = df.COL1.str.split(r"(?<=\d)_(?=\d+$)", expand=True)

See the regex demo

Pattern details:

  • (?<=\d) - there must a digit right before the current location
  • _ - an underscore
  • (?=\d+$) - there must be 1+ digits and end of string immediately to the right of the current location.

Pandas test:

df = pd.DataFrame({'COL1': ['NW_011625257.1_0','NW_011623521.1_1','NW_011623521.3_1','NW_011623521.4_1','NW_011623521.1','JZSA01007324.1_2','scaffold_1463_2','scaffold_1463']})
>>> df[['COL2','COL3']] = df.COL1.str.split(r"(?<=\d)_(?=\d+$)", expand=True)
>>> df
               COL1            COL2  COL3
0  NW_011625257.1_0  NW_011625257.1     0
1  NW_011623521.1_1  NW_011623521.1     1
2  NW_011623521.3_1  NW_011623521.3     1
3  NW_011623521.4_1  NW_011623521.4     1
4    NW_011623521.1  NW_011623521.1  None
5  JZSA01007324.1_2  JZSA01007324.1     2
6   scaffold_1463_2   scaffold_1463     2
7     scaffold_1463   scaffold_1463  None
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.