Extract values based on a pattern in a list python

Question

I would like to extract values based on certain pattern in a list.

**Example:**
ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']

**Expected Output:**
Tickercode-HF;BPO
StockCode-NYSE;NEW YORK
Relevancescore-81;0

**My code**:
Tickercode=[x for x in ticker if re.match(r'[\w\.-]+[\w\.-]+', x)]
Stockcode=[x for x in ticker if re.match(r'[\w\.-]+(%)+[\w\.-]+', x)]
Relevancescore=[x for x in ticker if re.match(r'[\w\.-]+(%)+[\w\.-]+', x)]

**My output:**
['HF (NYSE) (81%);BPO (NEW YORK)]']
[]
[]

But i am getting wrong output. Please help me to resolve the issue.

Thanks

I see no error in your output.

Chris
– Chris

2017-01-22 02:53:18 +00:00
Commented Jan 22, 2017 at 2:53 — Chris
– Chris, Commented Jan 22, 2017 at 2:53
Not getting error. But i am getting wrong output

Mho
– Mho

2017-01-22 03:07:12 +00:00
Commented Jan 22, 2017 at 3:07 — Mho
– Mho, Commented Jan 22, 2017 at 3:07

gzc · Accepted Answer · 2017-01-22 06:16:57Z

3

Firs, each item of ticker contains multiple records separated by semicolon, so I recommend normalize ticker. Then iterate over strings and extract info using pattern '(\w+) \(([\w ]+)\)( \(([\d]+)%\))?'.

import re

ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']
ticker=[y for x in ticker for y in x.split(';')]

Tickercode=[]
Stockcode=[]
Relevancescore=[]

for s in ticker:
    m = re.search(r'(\w+) \(([\w ]+)\)( \(([\d]+)%\))?', s)
    Tickercode.append(m.group(1))
    Stockcode.append(m.group(2))
    Relevancescore.append(m.group(4))

print(Tickercode)
print(Stockcode)
print(Relevancescore)

Output:

['HF', 'BPO']
['NYSE', 'NEW YORK']
['81', None]

Update:

Using re.search instead of re.match which will match pattern from start of string. Your input have a leading white space, causing it failed.

You can add this to print which string doesn't match.

    if m is None:
        print('%s cannot be matched' % s)
        continue

edited Jan 22, 2017 at 6:16

answered Jan 22, 2017 at 3:53

gzc

8,6998 gold badges45 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Mho Over a year ago

It throws me error for this input:['BAC (NYSE) (92%); BAC (LSE) (92%); 8648 (TSE) (92%); VNTV (NYSE) (92%); JPM (NYSE) (90%); JPM (LSE) (90%)]'] Error i am getting is:AttributeError: 'NoneType' object has no attribute 'group'

Mho Over a year ago

Its failing again for same input. I am wondering why its failing again

Mho Over a year ago

Yeah it works. I tried to change a little bit.m = re.search(r'(\w+) (\w+)((([\d]+)%))?', s). How about this case? ['INTEL CORP (82%)', ' AUTODESK INC (54%)', ' ACCENTURE PLC (54%)', ' PEPSICO INC (53%)', ' VASCULAR SOLUTIONS INC (51%)'] I am not able to extract % value

gzc Over a year ago

( and ) need to be escaped.

Mho Over a year ago

How will i do that. I am getting None output only.m = re.search(r'(\w+)\ (\w+)((([\d]+)%))?', s)

|

unpythonic · Accepted Answer · 2017-01-22 05:52:17Z

0

The problem with your code is that you're building up each of your lists from the input. You're telling it, "make a list of the input if the input matches my regular expression". The re.match() only matches against the beginning of a string, so the only regex that matches is the one that matches against the ticker symbol itself.

I've reorganized your code a bit below to show how it can work.

Use re.compile() to the regex doesn't have to be created each time
Use re.search() so you can find your embedded patterns
Use match.group(1) to get the matching part of the query, not the whole of the input.

Break up your input so you're only handling one group at a time

#!/usr/bin/env python

import re

# Example:
ticker=['HF (NYSE) (81%);BPO (NEW YORK)]']

# **Expected Output:**
# Tickercode-HF;BPO
# StockCode-NYSE;NEW YORK
# Relevancescore-81;0

tickercode=[]
stockcode=[]
relevancescore=[]

ticker_re = re.compile(r'^\s*([A-Z]+)')
stock_re = re.compile(r'\(([\w ]+)\)')
relevance_re = re.compile(r'\((\d+)%\)')

for tick in ticker:
    for stockinfo in tick.split(";"):
        ticker_match = ticker_re.search(stockinfo)
        stock_match = stock_re.search(stockinfo)
        relevance_match = relevance_re.search(stockinfo)

        ticker_code = ticker_match.group(1) if ticker_match else ''
        stock_code = stock_match.group(1) if stock_match else ''
        relevance_score = relevance_match.group(1) if relevance_match else '0'

        tickercode.append(ticker_code)
        stockcode.append(stock_code)
        relevancescore.append(relevance_score)

print 'Tickercode-' + ';'.join(tickercode)
print 'StockCode-' + ';'.join(stockcode)
print 'Relevancescore-' + ';'.join(relevancescore)

edited Jan 22, 2017 at 5:52

answered Jan 22, 2017 at 3:55

unpythonic

4,08822 silver badges20 bronze badges

2 Comments

Mho Over a year ago

['BAC (NYSE) (92%); BAC (LSE) (92%); 8648 (TSE) (92%); VNTV (NYSE) (92%); JPM (NYSE) (90%); JPM (LSE) (90%)]'] when i run for the above ticker list, i am getting wrong output.Output is:BAC,;;;

unpythonic Over a year ago

@Mho - Because now your input has spaces after the semi-colons. Fixed the ticker regex accordingly.

Collectives™ on Stack Overflow

Extract values based on a pattern in a list python

2 Answers 2

8 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related