Add a new column using regex pandas

Question

Hello I have a df such as :

COL1 COL2
A g1
B g1.t1
C transcript_id "g1.t1"; gene_id "g1"
D g2
E g2.t1
F transcript_id "g2.t1"; gene_id "g2"
G transcript_id "g2.t1"; gene_id "g2"

and I would like to add a new COL3 where I only put gvalue for each row

Here I should get :

COL1 COL2                               COL3
A g1                                    g1
B g1.t1                                 g1
C transcript_id "g1.t1"; gene_id "g1"   g1
D g2                                    g2
E g2.t1                                 g2
F transcript_id "g2.t1"; gene_id "g2"   g2
G transcript_id "g2.t1"; gene_id "g2"   g2

I tought I could use something like re.sub ?

I tried :

table[COL3]= re.sub(r'(?<=transcript_id )*.+(?<=gene_id ")','',table[COL2])

Quang Hoang · Accepted Answer · 2020-06-30 12:44:52Z

2

Is it:

df['COL3'] = df.COL2.str.extract('(g\d+)')

Output:

  COL1                                 COL2 COL3
0    A                                   g1   g1
1    B                                g1.t1   g1
2    C  transcript_id "g1.t1"; gene_id "g1"   g1
3    D                                   g2   g2
4    E                                g2.t1   g2
5    F  transcript_id "g2.t1"; gene_id "g2"   g2
6    G  transcript_id "g2.t1"; gene_id "g2"   g2

answered Jun 30, 2020 at 12:44

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Add a new column using regex pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related