1

I want to loop through a python list and group patterns of data.

The list is, in fact, a list of dicts with various properties which can be divided into 3 types. I will call these As, Bs and Cs.

The pattern I am looking for is each A type dict with the previous C dict plus the previous two B dicts. Each A and B dict should only exist in one group.

Example:

Original List (data): [A1, B1, B2, B3, C1, A2, A3, B4, B5, B6, B7, C2, B8, C3, A4]

Desired Result: [[B2,B3,C1,A2], [B7,B8,C3,A4]]

Conditions:

As you can see from the example an A should be ignored if there are no previous B and C's (e.g. A1) or if there is another A before these B and C's (e.g. A3). Also there may be rogue Cs that can also be ignored (e.g. C2).

What I have Tried:

# Extract indices for all A elements
As = [i for i, item in enumerate(data) if item['Class']=="A"]

 # Loop through the A's
for a in As:

    # Ensure the A isn't too close to the start of the list to have sufficient prev elements
    if a > 2:

        e = [data[a]]

        # For each prev item
        for index in range (a-1,0,-1):

            # Get the item
            item = data[index]            

            if (len(e) > 3) :
                continue #Exit once there are 4 items in the list 
            elif (len(e) > 1) :
                searching = "B"; # Start by seraching for B's
            else:
                searching = "C"; # After a B is found go to C's

            if item['Class']=="A": # If another A is found before the list is filled end the search
                break
            elif item['Class']==searching:
                e.append(item)


        if data[index]['Class']=="A":
            continue        

This works but feels like really terrible code! Any better solution suggestions would be appreciated.

4
  • 1
    You should read into finite state machines and write one to recognize your pattern. That should be an easy way to solve it. Or maybe this: en.wikipedia.org/wiki/Approximate_string_matching Commented Aug 21, 2018 at 10:19
  • 1
    you must use list comprehension, for some reason? if not, there are some easy solutions Commented Aug 21, 2018 at 10:34
  • Each item in the list is a dict with a key-value pair showing it's type (A/B/C). I used a list comprehension to find the indices of any A's in the original list but if there is a different way of achieving the same I have no tie to using this method at all. Commented Aug 21, 2018 at 10:42
  • I have updated this to reflect a (terrible) working solution I have made Commented Aug 21, 2018 at 11:01

1 Answer 1

1

I'd use Regex in your case

import re

# Convert to Class string representation
# example 'ABBBCAABBBBCBCA' 
string_repr = ''.join([item['Class'] for item in data])

# Compile pattern we are looking for
pattern = re.compile(r'BC*B+C+B*A')

# Find last possition of patterns in string_repr
positions = [match.end() - 1 for match in re.finditer(pattern, string_repr)]

# Use indices from positions on data list. Thay match
your_As = [data[i] for i in positions]
Sign up to request clarification or add additional context in comments.

4 Comments

It produced BBBCA and BCBCA. All match
This only finds the A's, but not the corresponding B's and C's to do the full grouping as requested in the question.
Thanks, this part works for finding the A's but as @HannesOvrén commented does not find the B's and C's to go with them. I have changed the Regex to this: (B)C*(B)C*(C)B*(A). It works for this example, do you think it is sufficient?
@user2071737 I think (B)C*(B)C*B*(C)B*(A) is a better choice.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.