I have a list of compounds like this:
ex = ['CrO3', 'Cr8O21', 'NbCrO4']
And I would like to get the elements and numbers separately. Something like this:
['Cr','O',3]
['Cr',8,'O',21]
['Nb','Cr','O',4]
However, this HAS to be a general process - these will not always be the compounds I am working with. I think this can be accomplished using regex and the split() function. However, I am having trouble finding the right regex expression that gets me what I want.
Here is what I have right now:
# elements to split by
split_elements = ['Cr','Nb','O']
def split(compound, split_elements):
separated = []
splitstr = ")|(?=".join([str(elem) for elem in split_elements])
splitstr = '('+splitstr+')'
# splitstr will end up like this:
# (Cr)|(?=Nb)|(?=O)
result = list(filter(None,re.split(splitstr, compound)))
separated.append(result)
return(separated)
for item in ex:
print(split(item, split_elements))
# Output
# [['Cr', 'O3']]
# [['Cr', '8O21']]
# [['Nb', 'Cr', 'O4']]
As you can see, the numbers are still attached, and I'm not sure why. I've searched for a similar issue, but I can't find any (and what I have right now is already the result of furious googling).
Does anyone have any solutions or suggestions?