3

I have an array, like: 

key = ['*', '(DATE*', '*', '*', '*)', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '(GPE*', '*)', '*', '*', '*', '(DATE)', '*'] 

I have such an array for which I want to perform task like,

  1. run through the array

  2. once I find the entry starting with '(' but not ending with ')'

  3. replace the next '' entries until we don't find ')' and also replace '*)' with the strip of found entry of starting with '('

  4. and if the entry is within '()' should be just stripped. as for 2nd last element (DATE) to be replaced with DATE only

for E.g. we have 2nd entry '(DATE*' followed by '','','*)' so these entries should be replaced with DATE only

output should be:

key = ['*', 'DATE', 'DATE', 'DATE', 'DATE', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', 'GPE', 'GPE', '*', '*', '*', 'DATE', '*'] 
0

6 Answers 6

3
**Nothing but some regex and while loops**
import re
key = key = ['*', '(DATE*', '*', '*', '*)', '*', '*', '*', '*', '*', '*', '*', '*', '*',
             '*', '*', '*', '*', '*', '*', '*', '*', '*', '(GPE*', '*)', '*', '*', '*', '(DATE)', '*']
val = 0
while val < len(key):
    value = key[val]
    if re.findall(r'\(',value):
        value = re.findall(r'\w+', value)[0]
        while re.findall(r'\)', key[val]) == []:
            key[val] = value
            val += 1
        key[val] = value
    val += 1
print key

output - ['*', 'DATE', 'DATE', 'DATE', 'DATE', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', 'GPE', 'GPE', '*', '*', '*', 'DATE', '*']

Sign up to request clarification or add additional context in comments.

Comments

1

I know its not much pythonic , anyway you can try this one :

key = ['*', '(DATE*', '*', '*', '*)', '*', '*', '*', '*', '*', '*',
   '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '(GPE*', '*)',
   '*', '*', '*', '(DATE)', '*']

for i in key:
    if i.startswith('(') and not (i.endswith(')')):
        a = key[key.index(i)+1:]
        for j in a:
            if j.endswith(')'):
                a = a[:a.index(j)+1]
                break
        for l in range(key.index(i), key.index(i)+len(a)+1):
            key[l] = i.strip('(').strip('*')
    elif i.startswith('(') and i.endswith(')'):
        key[key.index(i)] = i.strip('(').strip(')')

print(key)

It will give O/P like :

['*', 'DATE', 'DATE', 'DATE', 'DATE', '*', '*', '*', '*', '*', '*', 
 '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', 'GPE', 
'GPE', '*', '*', '*', 'DATE', '*']

1 Comment

Glad to hear :)
1
`key = ['*', '(DATE*', '*', '*', '*)', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '(GPE*', '*)', '*', '*', '*', '(DATE)', '*']
outKeys = []
isFound = False
for k in key:
    if k.startswith("(") and k.endswith(")"):
        k = k[k.find("(")+1:k.find(")")]
    elif k.startswith("("):
        k = k[k.find("(")+1:k.find("*")]
        isFound = k
    elif k.endswith(")"):
        k = isFound
        isFound = False
    elif isFound:
        k = isFound
    outKeys.append(k)
print(outKeys)`

This will give u output:

['*', 'DATE', 'DATE', 'DATE', 'DATE', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', 'GPE', 'GPE', '*', '*', '*', 'DATE', '*']

Comments

1

I suggest you this easily readable solution. I defined another list newKey to avoid modifying a list while iterating over its owm elements :

key = ['*', '(DATE*', '*', '*', '*)', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '(GPE*', '*)', '*', '*', '*', '(DATE)', '*']


newKey = []
next_x = None

for x in key:
    if x.startswith('(') and x.endswith(')'):
        newKey.append(x.strip('()*'))
    elif x.startswith('('):
        newKey.append(x.strip('(*'))
        next_x = x.strip('(*')
    elif x.endswith(')'):
        newKey.append(next_x.strip('*)'))
        next_x = None
    elif next_x is not None:
        newKey.append(next_x)
    else:
        newKey.append(x)  

key = newKey[:]

print(key)

Comments

1

You can use below code:

current_entry = None
for i, k in enumerate(key):
    if k.startswith('(') and k.endswith(')'):
        key[i] = k.strip('(').strip(')')
        continue
    if k.startswith('(') and not k.endswith(')'):
        current_entry = k.strip('(').strip('*')
    if current_entry:
        key[i] = current_entry
    if k.endswith(')'):
        current_entry = None

Comments

0

Can be done using a simple regex:

string = ' '.join(['*', '(DATE*', '*', '*', '*)', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '*', '(GPE*', '*)', '*', '*', '*', '(DATE)', '*'])
result = re.sub(r'\((.*?)\)', lambda m: ' '.join([m.group(1).replace('*', '').strip()
 for n in range(1 if m.group(0).count('*') == 0 else m.group(0).count('*'))]), string).split(' ')
print(result)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.