1

I have a complicated string and would like to try to extract multiple substring from it.

The string consists of a set of items, separated by commas. Each item has an identifier (id-n) for a pair of words inside which is enclosed by brackets. I want to get only the word inside the bracket which has a number attached to its end (e.g. 'This-1'). The number actually indicates the position of how the words should be arrannged after extraction.

#Example of how the individual items would look like
id1(attr1, is-2) #The number 2 here indicates word 'is' should be in position 2
id2(attr2, This-1) #The number 1 here indicates word 'This' should be in position 1
id3(attr3, an-3) #The number 3 here indicates word 'an' should be in position 3
id4(attr4, example-4) #The number 4 here indicates word 'example' should be in position 4
id5(attr5, example-4) #This is a duplicate of the word 'example'

#Example of string - this is how the string with the items looks like
string = "id1(attr1, is-1), id2(attr2, This-2), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"

#This is how the result should look after extraction
result = 'This is an example'

Is there an easier way to do this? Regex doesn't work for me.

9
  • 2
    I can't make sense of your example. Could you try describing it a different way? Commented Jun 12, 2013 at 4:00
  • @DaoWen - Sorry the string itself is a little complicated. It's difficult to describe it. Commented Jun 12, 2013 at 4:19
  • What governs the reordering of items? Commented Jun 12, 2013 at 4:19
  • @IgnacioVazquez-Abrams - the number attached to the end of the second word in the brackets. Commented Jun 12, 2013 at 4:20
  • What happened to the other example? Commented Jun 12, 2013 at 4:20

3 Answers 3

2

A trivial/naive approach:

>>> z = [x.split(',')[1].strip().strip(')') for x in s.split('),')]
>>> d = defaultdict(list)
>>> for i in z:
...    b = i.split('-')
...    d[b[1]].append(b[0])
...
>>> ' '.join(' '.join(d[t]) for t in sorted(d.keys(), key=int))
'is This an example example'

You have duplicated positions for example in your sample string, which is why example is repeated in the code.

However, your sample is not matching your requirements either - but this results is as per your description. Words arranged as per their position indicators.

Now, if you want to get rid of duplicates:

>>> ' '.join(e for t in sorted(d.keys(), key=int) for e in set(d[t]))
'is This an example'
Sign up to request clarification or add additional context in comments.

3 Comments

example is repeated, though, which is not what OP wants.
well if you want to be technical, his example isn't actually what he wants either since the words are switched around.
I would take that as a typo. But the requirement about same position is stated quite clearly, though.
2

Why not regex? This works.

In [44]: s = "id1(attr1, is-2), id2(attr2, This-1), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"

In [45]: z = [(m.group(2), m.group(1)) for m in re.finditer(r'(\w+)-(\d+)\)', s)]

In [46]: [x for y, x in sorted(set(z))]
Out[46]: ['This', 'is', 'an', 'example']

2 Comments

You failed to detect that example is repeated (same position of 4) and only one of them should be kept.
OK, I have revised borrowing set from Burahn's answer.
1

OK, how about this:

sample = "id1(attr1, is-2), id2(attr2, This-1), 
          id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"


def make_cryssie_happy(s):
    words = {} # we will use this dict later
    ll = s.split(',')[1::2]
    # we only want items like This-1, an-3, etc.

    for item in ll:
        tt = item.replace(')','').lstrip()
        (word, pos) = tt.split('-')
        words[pos] = word
        # there can only be one word at a particular position
        # using a dict with the numbers as positions keys 
        # is an alternative to using sets

    res = [words[i] for i in sorted(words)]
    # sort the keys, dicts are unsorted!
    # create a list of the values of the dict in sorted order

    return ' '.join(res)
    # return a nice string


print make_cryssie_happy(sample)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.