Extract columns from string

Question

I have a pandas df column containing the following strings:

0    Future(conId=462009617, symbol='CGB', lastTradeDateOrContractMonth='20211220', multiplier='1000', currency='CAD', localSymbol='CGBZ21', tradingClass='CGB')
1    Stock(conId=80268543, symbol='IJPA', exchange='AEB', currency='EUR', localSymbol='IJPA', tradingClass='IJPA')
2    Stock(conId=153454120, symbol='EMIM', exchange='AEB', currency='EUR', localSymbol='EMIM', tradingClass='EMIM')

I would like to extract data from strings and organize it as columns. As you can see, not all rows contain the same data and they are not in the same order. I only need some of the columns; this is the expected output:

     Type      conId symbol  localSymbol
0  Future  462009617    CGB       CGBZ21
1   Stock   80268543   IJPA         IJPA
2   Stock  153454120   EMIM         EMIM

I made some tests with str.extract but couldn't get what I want.

Any ideas on how to achieve it? Thanks

Shivam Roy · Accepted Answer · 2021-09-10 16:52:09Z

1

You could try this using string methods. Assuming that the strings are stored in a column named 'main_col':

df["Type"] = df.main_col.str.split("(", expand = True)[0]
df["conId"] = df.main_col.str.partition("conId=")[2].str.partition(",")[0]
df["symbol"] = df.main_col.str.partition("symbol=")[2].str.partition(",")[0]
df["localSymbol"] = df.main_col.str.partition("localSymbol=")[2].str.partition(",")[0]

edited Sep 10, 2021 at 16:52

answered Sep 10, 2021 at 14:04

Shivam Roy

2,0713 gold badges12 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

younggotti Over a year ago

Thanks for your answer. The first row of code raises the following ValueError: Length of values (2) does not match length of index (10) Do you have any idea why?

Shivam Roy Over a year ago

Hi, apologies. I missed a parameter and hence split was creating a list from the first occurrence of value only. Please add the parameter expand = True to the split method and it should work fine. I have made the edit to the answer.

younggotti Over a year ago

Thanks. Now the first command works fine, but I get an error on the second line: 'Series' object has no attribute 'partition'. I'm interested in your code because it looks easier to read than Nikolaos's one

Shivam Roy Over a year ago

Hi, I apologize again as I missed another tiny detail. Since I'm doing the partition twice to get the substring. I should have used str again as an intermediate Series object gets created. I have made the changes. I have tested the code on a sample, hopefully the edited code would work fine for you.

younggotti Over a year ago

Great, now it works and it's very easy to read

|

Nikolaos Chatzis · Accepted Answer · 2021-09-10 14:36:41Z

One solution using pandas.Series.str.extract (as you tried using it):

>>> df
                                                                                                                                                           col
0  Future(conId=462009617, symbol='CGB', lastTradeDateOrContractMonth='20211220', multiplier='1000', currency='CAD', localSymbol='CGBZ21', tradingClass='CGB')
1  Stock(conId=80268543, symbol='IJPA', exchange='AEB', currency='EUR', localSymbol='IJPA', tradingClass='IJPA')                                              
2  Stock(conId=153454120, symbol='EMIM', exchange='AEB', currency='EUR', localSymbol='EMIM', tradingClass='EMIM')

>>> df.col.str.extract(r"^(?P<Type>Future|Stock).*conId=(?P<conId>\d+).*symbol='(?P<symbol>[A-Z]+)'.*localSymbol='(?P<localSymbol>[A-Z0-9]+)'")
     Type      conId symbol localSymbol
0  Future  462009617  CGB    CGBZ21    
1  Stock   80268543   IJPA   IJPA      
2  Stock   153454120  EMIM   EMIM

In the above, I assume that:

Type takes the two values Future or Stock
conId consists of digits
symbol consists of capital alphabet letters
localSymbol consists of digits and capital alphabet letters

You may want to adapt the pattern to better fit your needs.

Collectives™ on Stack Overflow

Extract columns from string

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related