Hi I have a dataframe df that has headers like this:
DATE COL1 COL2 ... COL10
date1 a b
... ... ... ...
and so on
Basically each row is just a date and then a bunch of columns on the same row that have some text in or they don't.
From this I want to create a new df df2 that has a row for each non blank 'cell' in the original data frame consisting of the date and the text from that cell. From the above example we could get
df2=
DATE COL
date1 a
date1 b
In pseudocode what I want to achieve is:
df2 = blank df
for row in df:
for column in row:
if cell is not empty:
append to df2 a row consisting of the date for that row and the value in that cell
So far I have
import pandas as pd
df = pd.read_csv("data2.csv")
output_df = pd.DataFrame(columns=['Date', 'Col'])
Basically I have read in the df, and created the new df to begin populating.
Now I am stuck, some investigation has told me I should not use iterrows() as it is not efficient and bad practise and I have 300k+ rows in df.
Any suggestions how I can do this please?