I am trying read in a folder of CSV files, process them one by one to remove duplicates, and then add them to a master dataframe which will then finally be output to a CSV. I have this...
import pandas as pd
import os
import sys
output = pd.DataFrame(columns=['col1', 'col2'])
for root, dirs, files in os.walk("sourcefolder", topdown=False):
for name in files:
data = pd.read_csv(os.path.join(root, name), usecols=[1], skiprows=1)
output.append(data)
output.to_csv("output.csv", index=False, encoding='utf8')
But my output CSV is empty apart fom the column names. Anyone any idea where I am going wrong?
output = output.append(data)output = []then append to it, then callpd.concat(pd.DataFrame(output))