3

This is the posdf:

      tradingsymbol
0     XYZ2061820500PE
1     XYZ20JUN21000PE
2     ABC20JUN100CE
3    ABC20JUN102.5PE
4     ABC20JUN92.5PE
4     XYZ20JUNFUT

I am doing this to extract the ABC and XYZ to a column:

posdf['symbol'] = posdf['tradingsymbol'].str.extract('^(\D+)', expand=True)

I cannot figure out how I can make a generalised way to extract the following columns:

     strike    type   Expiry
0    20500     PE     20618
1    21000     PE     20JUN
2    100       CE     20JUN
3    102.5     PE     20JUN
4    92.5      PE     20JUN
4    NA        FUT    20JUN

Edit

type is min 2 chars max 3. Expiry is always 5 chars. Which could possibly have this form: 20O18 or 20N18 or 20D18.

2nd Edit

Adding rows where type can be 3 chars based on Sammy's comment.

4
  • 4
    are type and Expiry always 2 and 5 chars long? Commented Jun 11, 2020 at 11:30
  • I think you are on to sth Stef; probably try it out and see if your idea holds Commented Jun 11, 2020 at 11:32
  • Type can be min 2, max 3. Expiry is always 5 chars. Should Have mentioned. Commented Jun 11, 2020 at 11:32
  • 1
    I'd suggest @Sid, that you add rows where type can be three characters long; your current dataframe fits only two characters for type Commented Jun 11, 2020 at 11:38

3 Answers 3

4

Use, Series.str.extract with a given regex pattern:

df1 = df['tradingsymbol'].str.extract(
    r'(?P<expiry>\d{5}|\d{2}\w{3})(?P<strike>\d+(?:\.\d+)?)?(?P<type>\w+)')
df1 = df1[['strike', 'type', 'expiry']]

Result:

# print(df1)
  strike type expiry

0  20500   PE  20618
1  21000   PE  20JUN
2    100   CE  20JUN
3  102.5   PE  20JUN
4   92.5   PE  20JUN
4    NaN  FUT  20JUN

You can test the regex here.

Sign up to request clarification or add additional context in comments.

Comments

3

if Strike is always numerical then you can do:

posdf[['Symbol','Expiry','Strike','Type']] = posdf['tradingsymbol'].str.extract('^(\D+)(.{5})([0-9.]*)([a-zA-Z]{2,3})', expand=True)

Result:

     tradingsymbol Symbol Expiry Strike Type
0  XYZ2061820500PE    XYZ  20618  20500   PE
1  XYZ20JUN21000PE    XYZ  20JUN  21000   PE
2    ABC20JUN100CE    ABC  20JUN    100   CE
3  ABC20JUN102.5PE    ABC  20JUN  102.5   PE
4   ABC20JUN92.5PE    ABC  20JUN   92.5   PE
4      XYZ20JUNFUT    XYZ  20JUN         FUT

Comments

2

Bit of a hack :

res = (df.assign(Expiry = df.tradingsymbol.str[3:8],
                 type = df.tradingsymbol.str[8:].str.split("([a-zA-Z]+)").str[1],
                 strike = df.tradingsymbol.str[8:].str.split("[a-zA-Z]+").str[0],
                )
      )

res


   tradingsymbol    Expiry  type    strike
0   XYZ2061820500PE 20618   PE      20500
1   XYZ20JUN21000PE 20JUN   PE      21000
2   ABC20JUN100CE   20JUN   CE      100
3   ABC20JUN102.5PE 20JUN   PE      102.5
4   ABC20JUN92.5PE  20JUN   PE      92.5
4   XYZ20JUNFUT     20JUN   FUT 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.