0

I would like to extract values based on certain pattern in a list.

**Example:**
ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']

**Expected Output:**
Tickercode-HF;BPO
StockCode-NYSE;NEW YORK
Relevancescore-81;0

**My code**:
Tickercode=[x for x in ticker if re.match(r'[\w\.-]+[\w\.-]+', x)]
Stockcode=[x for x in ticker if re.match(r'[\w\.-]+(%)+[\w\.-]+', x)]
Relevancescore=[x for x in ticker if re.match(r'[\w\.-]+(%)+[\w\.-]+', x)]

**My output:**
['HF (NYSE) (81%);BPO (NEW YORK)]']
[]
[]

But i am getting wrong output. Please help me to resolve the issue.

Thanks

2
  • I see no error in your output. Commented Jan 22, 2017 at 2:53
  • Not getting error. But i am getting wrong output Commented Jan 22, 2017 at 3:07

2 Answers 2

3

Firs, each item of ticker contains multiple records separated by semicolon, so I recommend normalize ticker. Then iterate over strings and extract info using pattern '(\w+) \(([\w ]+)\)( \(([\d]+)%\))?'.

import re

ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']
ticker=[y for x in ticker for y in x.split(';')]

Tickercode=[]
Stockcode=[]
Relevancescore=[]

for s in ticker:
    m = re.search(r'(\w+) \(([\w ]+)\)( \(([\d]+)%\))?', s)
    Tickercode.append(m.group(1))
    Stockcode.append(m.group(2))
    Relevancescore.append(m.group(4))

print(Tickercode)
print(Stockcode)
print(Relevancescore)

Output:

['HF', 'BPO']
['NYSE', 'NEW YORK']
['81', None]

Update:

Using re.search instead of re.match which will match pattern from start of string. Your input have a leading white space, causing it failed.

You can add this to print which string doesn't match.

    if m is None:
        print('%s cannot be matched' % s)
        continue
Sign up to request clarification or add additional context in comments.

8 Comments

It throws me error for this input:['BAC (NYSE) (92%); BAC (LSE) (92%); 8648 (TSE) (92%); VNTV (NYSE) (92%); JPM (NYSE) (90%); JPM (LSE) (90%)]'] Error i am getting is:AttributeError: 'NoneType' object has no attribute 'group'
Its failing again for same input. I am wondering why its failing again
Yeah it works. I tried to change a little bit.m = re.search(r'(\w+) (\w+)((([\d]+)%))?', s). How about this case? ['INTEL CORP (82%)', ' AUTODESK INC (54%)', ' ACCENTURE PLC (54%)', ' PEPSICO INC (53%)', ' VASCULAR SOLUTIONS INC (51%)'] I am not able to extract % value
( and ) need to be escaped.
How will i do that. I am getting None output only.m = re.search(r'(\w+)\ (\w+)((([\d]+)%))?', s)
|
0

The problem with your code is that you're building up each of your lists from the input. You're telling it, "make a list of the input if the input matches my regular expression". The re.match() only matches against the beginning of a string, so the only regex that matches is the one that matches against the ticker symbol itself.

I've reorganized your code a bit below to show how it can work.

  • Use re.compile() to the regex doesn't have to be created each time
  • Use re.search() so you can find your embedded patterns
  • Use match.group(1) to get the matching part of the query, not the whole of the input.
  • Break up your input so you're only handling one group at a time

    #!/usr/bin/env python
    
    import re
    
    # Example:
    ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']
    
    # **Expected Output:**
    # Tickercode-HF;BPO
    # StockCode-NYSE;NEW YORK
    # Relevancescore-81;0
    
    tickercode=[]
    stockcode=[]
    relevancescore=[]
    
    ticker_re = re.compile(r'^\s*([A-Z]+)')
    stock_re = re.compile(r'\(([\w ]+)\)')
    relevance_re = re.compile(r'\((\d+)%\)')
    
    for tick in ticker:
        for stockinfo in tick.split(";"):
            ticker_match = ticker_re.search(stockinfo)
            stock_match = stock_re.search(stockinfo)
            relevance_match = relevance_re.search(stockinfo)
    
            ticker_code = ticker_match.group(1) if ticker_match else ''
            stock_code = stock_match.group(1) if stock_match else ''
            relevance_score = relevance_match.group(1) if relevance_match else '0'
    
            tickercode.append(ticker_code)
            stockcode.append(stock_code)
            relevancescore.append(relevance_score)
    
    print 'Tickercode-' + ';'.join(tickercode)
    print 'StockCode-' + ';'.join(stockcode)
    print 'Relevancescore-' + ';'.join(relevancescore)
    

2 Comments

['BAC (NYSE) (92%); BAC (LSE) (92%); 8648 (TSE) (92%); VNTV (NYSE) (92%); JPM (NYSE) (90%); JPM (LSE) (90%)]'] when i run for the above ticker list, i am getting wrong output.Output is:BAC,;;;
@Mho - Because now your input has spaces after the semi-colons. Fixed the ticker regex accordingly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.