I have a list of strings I've scraped and I'd like to chunk the strings into groups and then reshape it into columnar data. The variable titles aren't present for each group, however.
My list is called complist and looks like this:
[u'Intake Received Date:',
u'9/11/2012',
u'Intake ID:',
u'CA00325127',
u'Allegation Category:',
u'Infection Control',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'5/14/2012',
u'Intake ID:',
u'CA00310421',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Investigation Finding:',
u'Substantiated',
u'Intake Received Date:',
u'8/15/2011',
u'Intake ID:',
u'CA00279396',
u'Allegation Category:',
u'Quality of Care/Treatment',
u'Sub Categories:',
u'Screening',
u'Investigation Finding:',
u'Unsubstantiated',]
And my goal is to make it look like this:
'Intake Received Date', 'Intake ID', 'Allegation Category', 'Sub Categories', 'Investigation Finding'
'9/11/2012', 'CA00325127', 'Infection Control', '', 'Substantiated'
'5/14/2012', 'CA00310421', 'Quality of Care/Treatment', '', 'Substantiated'
'8/15/2011', 'CA00279396', 'Quality of Care/Treatment', 'Screening', 'Unsubstantiated'
First thing I did was to break the list into chunks based on the starting element Intake Received Date
compgroup = []
for k, g in groupby(complist, key=lambda x:re.search(r'Intake Received Date', x)):
if not k:
compgroup.append(list(g))
#Intake Received Date was removed, so insert it back to beginning of each list:
for c in compgroup:
c.insert(0, u'Intake Received Date')
#Create list of dicts to map the preceding titles to their respective data element:
dic = []
for c in compgroup:
dic.append(dict(zip(*[iter(c)]*2)))
The next step would be to convert the list of dicts into columnar data, but at this point I feel my approach is overly complicated and that I must be missing something more elegant. I'd appreciate any guidance.
['Intake Received Date', 'Intake ID', 'Allegation Category', 'Sub Categories', 'Investigation Finding']are all present just like that? Are the data in between those fields fixed in number?