Here is one way to get a dictionary, where for each "name" key the value is a list of the strings starting with that name, keeping the order of the original list. This does not use regex and in fact uses no modules at all. You can easily modify this to make a function, remove the trailing underscore from each name, checking for various errors in the data list, getting the resulting lists out of the dictionary, and so on.
If you allow other modules, or allow changes in the order, I'm sure there are other ways.
a = ['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv',
'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv',
'daman_and_diu_2002_aa.csv']
names_dict = {}
for item in a:
# Find the first numeric character in the item
for i, c in enumerate(item):
if c.isdigit():
break
# Store the string in the dictionary according to its preceding characters
name = item[:i]
if names_dict.get(name, None):
names_dict[name].append(item)
else:
names_dict[name] = [item]
print(names_dict)
The result of this code (prettified) is
{'daman_and_diu_': [
'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv',
'daman_and_diu_2002_aa.csv'],
'chhattisgarh_': [
'chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv']
}
_? like usingname.partition("_")[0]to compare titles? This wouldn't work if you had titles like'foo_bar_2000' vs 'foo_foo_2000'though._s