Hello I have a df such as :
COL1 COL2
A g1
B g1.t1
C transcript_id "g1.t1"; gene_id "g1"
D g2
E g2.t1
F transcript_id "g2.t1"; gene_id "g2"
G transcript_id "g2.t1"; gene_id "g2"
and I would like to add a new COL3 where I only put gvalue for each row
Here I should get :
COL1 COL2 COL3
A g1 g1
B g1.t1 g1
C transcript_id "g1.t1"; gene_id "g1" g1
D g2 g2
E g2.t1 g2
F transcript_id "g2.t1"; gene_id "g2" g2
G transcript_id "g2.t1"; gene_id "g2" g2
I tought I could use something like re.sub ?
I tried :
table[COL3]= re.sub(r'(?<=transcript_id )*.+(?<=gene_id ")','',table[COL2])