1

I have a list of strings containing filenames such as,

file_names = ['filei.txt','filej.txt','filek.txt','file2i.txt','file2j.txt','file2k.txt','file3i.txt','file3j.txt','file3k.txt']

I then remove the .txt extension using:

extension = os.path.commonprefix([n[::-1] for n in file_names])[::-1]

file_names_strip = [n[:-len(extension)] for n in file_names]

And then return the last character of each string in the list file_names_strip:

h = [n[-1:] for n in file_names_strip]

Which gives h = ['i', 'j', 'k', 'i', 'j', 'k', 'i', 'j', 'k']

How can i test for a pattern of strings in h? So if i,j,k occur sequentially it would return True and False if not. I need to know this because not all file names are formatted like they are in file_names.

So:

test_ijk_pattern(h) = True

no_pattern = ['1','2','3','1','2','3','1','2','3']

test_ijk_pattern(no_pattern) = False
2
  • 3
    possible duplicate of Check for presence of a sublist in Python Commented Nov 8, 2013 at 13:47
  • @Doorknob, thanks for the post - must of missed this question. This answer is giving me the correct output. Is there any way to achieve this without stripping the extension and returning the last character? I.e test_ijk_patter(file_names) = True ..? Commented Nov 8, 2013 at 13:58

2 Answers 2

1

Here's how I would attack this:

def patternFinder(h):    #Takes a list and returns a list of the pattern if found, otherwise returns an empty list

    if h[0] in h[1:]:
        rptIndex = h[1:].index(h[0]) + 1 #Gets the index of the second instance of the first element in the list
    else:
        print "This list has no pattern"
        return []

    if len(h) % rptIndex != 0:
        h = h[:-(len(h) % rptIndex)]   #Takes off extra entries at the end which would break the next step

    subLists = [h[i:i+rptIndex] for i in range(0,len(h),rptIndex)]   #Divide h into sublists which should all have the same pattern

    hasPattern = True   #Assume the list has a pattern
    numReps = 0  #Number of times the pattern appears

    for subList in subLists:
        if subList != subLists[0]: 
            hasPattern = False
        else:
            numReps += 1

    if hasPattern and numReps != 1:
        pattern = subList[0]
        return pattern
    else:
        print "This list has no pattern"
        return []

Assumptions that this makes:

  • The pattern is shown in the first few characters
  • Incomplete patterns at the end aren't important ([1,2,3,1,2,3,1,2] will come up with having 2 instances of [1,2,3])
  • h has at least 2 entries
  • There are no extra characters between patterns

If you're fine with these assumptions, then this will work for you, hope this helps!

Sign up to request clarification or add additional context in comments.

Comments

0

You could use regex.

import re
def test_pattern(pattern, mylist):
  print pattern
  print mylist
  print "".join(mylist)
  if re.match(r'(%s)+$' % pattern, "".join(mylist)) != None: # if the pattern matchtes at least one time, nothing else is allowed
    return True
  return False       

print test_pattern("ijk", ["i", "j", "k", "i", "j", "k"])

You could do it this way without stripping the last letters and the file endings. I updated the regular expression so that it works. One problem was that I used the variable name and it looked for the pattern "mypattern". Using %s replaces it with the real pattern. I hope this solution suits you.

myfiles = ["ai.txt", "aj.txt", "ak.txt", "bi.txt", "bj.txt", "bk.txt"]
mypattern = ["i", "j", "k"]

import re
# pattern as a list e.g. ["i", "j", "k"]
def test_pattern(pattern, filenames):
    mypattern = "["+"\.[a-zA-Z0-9]*".join(pattern) + "\.[a-zA-Z0-9]*]*"
    # this pattern matches any character, an "i", followed by a dot, any characters, followed by j., any characters, followd by k. (change it a bit if your file names contain numbers and/or uppercase)
    print mypattern
    print "".join(filenames)
    if re.search(r'%s' % mypattern, "".join(filenames)) != None: # if the pattern matchtes at least one time, nothing else is allowed
        return True
    return False



print test_pattern(mypattern, myfiles)

Output:

[i\.[a-zA-Z0-9]*j\.[a-zA-Z0-9]*k\.[a-zA-Z0-9]*]*
ai.txtaj.txtak.txtbi.txtbj.txtbk.txt
True

3 Comments

I keep getting a syntax error - is it supposed to be mypattern = "\w"+'\w\.'.join(pattern) +"\.\w"?
after join(pattern) I forgot a +
the pattern is not correct yet, I'll keep you updated as soon as I figure it out!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.