The context
I have a string made of mixed mp3 information that I must try to match against a pattern made of arbitrary strings and tokens. It works like that :
- The program shows the user a given string
the Beatles_Abbey_Road-SomeWord-1969
- User enter a pattern to help program parse the string
the %Artist_%Album-SomeWord-%Year
- Then I'd like to show results of the matches (but need your help for that)
2 possible matches found :
[1] {'Artist': 'Beatles', 'Album':'Abbey_Road', 'Year':1969}
[2] {'Artist': 'Beatles_Abbey', 'Album':'Road', 'Year':1969}
The problem
As an example, let say pattern is artist name followed by title (delimiter: '-').
Example 1:
>>> artist = 'Bob Marley'
>>> title = 'Concrete Jungle'
>>> re.findall(r'(.+)-(.+)', '%s-%s' % (artist,title))
[('Bob Marley', 'Concrete Jungle')]
So far, so good. But...
I have no control over the delimiter used and have no guarantee that it's not present in the tags, so trickier cases exist :
Example 2:
>>> artist = 'Bob-Marley'
>>> title = 'Roots-Rock-Reggae'
>>> re.findall(r'(.+)-(.+)', '%s-%s' % (artist,title))
[('Bob-Marley-Roots-Rock', 'Reggae')]
As expected, it doesn't work in that case.
How can I generate all possible combinations of artist/title ?
[('Bob', 'Marley-Roots-Rock-Reggae'),
('Bob-Marley', 'Roots-Rock-Reggae')
('Bob-Marley-Roots', 'Rock-Reggae'),
('Bob-Marley-Roots-Rock', 'Reggae')]
Are regex the tool to use for that job ?
Please keep in mind that number of tags to match and delimiters between those tags are not fixed but user defined (so the regex to use has to be buildable dynamically).
I tried to experiment with greedy vs minimal matching and lookahead assertions with no success.
Thanks for your help