I have a text file that i'm trying to convert to a Excel file in python 3. The text files have a series of accounts - one text file looks like: example -
PRODUCE_NAME: abc
PRODUCE_NUMBER: 12345
DATE: 12/1/13
PRODUCE_NAME: efg
PRODUCE_NUMBER: 987
DATE: 2/16/16
TIME: 12:54:00
PRODUCE_NAME: xyz
PRODUCE_NUMBER: 0046
DATE: 7/15/10
COLOR: blue.
I would like the excel file to look like this. enter image description here
some code: ` # open text file
op_file = open("Comp_file_1.txt", "r", encoding='windows-1252')
text_file = op_file.read()
##############################################################
# location of CAP WORD: and group them
for mj in re.finditer(r"[A-Z]\w+(:)", text_file):
col_list_start.append(mj.start(0))
col_list_end.append(mj.end(0))
col_list_group.append(mj.group())
#############################################################
# Location of the end of file and delete index 0 of start
while True:
# Advance location by 1.
location = text_file.find(".", location + 1)
# Break if not found.
if location == -1: break
# Display result.
endline = location
col_list_start.append(int(endline))
del col_list_start[0]
##############################################################
# cut out the index of the rows - abc , 12345, 12/1/13
for m in range(len(col_list_end)):
index4.append(file_data2[col_list_end[m]:col_list_start[m]])
##############################################################
# makes a data frame
# and groups the data frame
group_excel_list = {}
for k,v in zip(col_list_group, index4):
group_excel_list.setdefault(k, []).append(v)`
dataframe looks like this
key value
{"PRODUCE_NAME:": [abc, efg, xyz]}
{"PRODUCE_NUMBER:" : [12345, 987, 0046]}
{"DATE:" : [12/1/13, 2/16/16, 7/15/10]}
{"TIME:" : [12:54:00]}
{"COLOR:" [blue]}
df = pd.DataFrame(data=[group_excel_list], columns = col_list_group)
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("Comp_file_1" + '.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
I'm getting just one row of the dataframe. Header - PRODUCE_NAME: PRODUCE_NUMBER: DATE: row 0 - [abc, efg, xyz] [12345, 987, 0046] [12/1/13, 2/16/16, 7/15/10]
Whatever help you can give would be appreciated.