2

I have the following python list:

['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv', 'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv', 'daman_and_diu_2002_aa.csv']

How do I separate it into 2 lists:

['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv'] and ['daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv', 'daman_and_diu_2002_aa.csv']

The lists are split based on the words preceeding the year i.e. 2000...

I know I should use regex in python but not sure how to do it. Also, the solution needs to be extensible and not dependent on actual names e.g. chattisgarh

5
  • thanks @RoryDaulton, the elements are strings. Updated my question to reflect that Commented Jun 19, 2016 at 22:56
  • Could you do it based on the text before the first _? like using name.partition("_")[0] to compare titles? This wouldn't work if you had titles like 'foo_bar_2000' vs 'foo_foo_2000' though. Commented Jun 19, 2016 at 22:57
  • doesn't work since different list elements can have different number of _s Commented Jun 19, 2016 at 22:58
  • Are you sure the year contains the first numeric character in each list? Commented Jun 19, 2016 at 22:59
  • yes, the year contains the first and only numeric character in the list Commented Jun 19, 2016 at 22:59

3 Answers 3

5

You can use itertools.groupby here:

import itertools
import re

list = ['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv',
        'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv',
        'daman_and_diu_2002_aa.csv']

grouped = itertools.groupby(sorted(list), lambda x: re.match('(.+)_\d{4}', x).group(1))    

for (key, values) in grouped:
    print(key)
    print([x for x in values])

The regex (.+)_\d{4} matches a group of at least one character (which is what we group by) followed by an underscore and 4 digits.

Sign up to request clarification or add additional context in comments.

Comments

4

Here is one way to get a dictionary, where for each "name" key the value is a list of the strings starting with that name, keeping the order of the original list. This does not use regex and in fact uses no modules at all. You can easily modify this to make a function, remove the trailing underscore from each name, checking for various errors in the data list, getting the resulting lists out of the dictionary, and so on.

If you allow other modules, or allow changes in the order, I'm sure there are other ways.

a = ['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv',
     'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv',
     'daman_and_diu_2002_aa.csv']

names_dict = {}
for item in a:
    # Find the first numeric character in the item
    for i, c in enumerate(item):
        if c.isdigit():
            break
    # Store the string in the dictionary according to its preceding characters
    name = item[:i]
    if names_dict.get(name, None):
        names_dict[name].append(item)
    else:
        names_dict[name] = [item]

print(names_dict)

The result of this code (prettified) is

{'daman_and_diu_': [
    'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv',
    'daman_and_diu_2002_aa.csv'],
 'chhattisgarh_': [
    'chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv']
}

Comments

2

Another option to use regular expression combined with dictionary:

files = ["chhattisgarh_2015_aa.csv", "chhattisgarh_2016_aa.csv", "daman_and_diu_2000_aa.csv", "daman_and_diu_2001_aa.csv", "daman_and_diu_2002_aa.csv"]

import re
from collections import defaultdict

groupedFiles = defaultdict(list)
for fileName in files:
    pattern = re.findall("(.*)\\d{4}", fileName)[0]
    groupedFiles[pattern].append(fileName)

groupedFiles

{'chhattisgarh_': ['chhattisgarh_2015_aa.csv',
                   'chhattisgarh_2016_aa.csv'],
 'daman_and_diu_': ['daman_and_diu_2000_aa.csv',
                    'daman_and_diu_2001_aa.csv',
                    'daman_and_diu_2002_aa.csv']}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.