I have data frame series which looks like this:
a b r
1 43 630 587
d b c
1 34 30 87
I want to create a new dataframe which looks like:
a b r d c
43 630 587 0 0
0 30 0 34 87
I have used the code:
appended_data= pd.concat(appended_data, axis=0)
where the list appended_data contains the individual dataframe series as elements.
Earlier when I used it with other dataset it didnt throw any error but with the new dataset its showing ValueError: Plan shapes are not aligned
Note: Earlier dataset also had duplicate columns and it worked fine then and I also updated pandas.These were the solutions I found online.
full code:
dir_list = [benign_freq_dir,malign_freq_dir]
appended_data = []
for l in dir_list:
for root, dirs, files in os.walk(l):
#print(root)
for name in files:
file = open(root+"/"+name,'r')
print(name)
print("\n")
df = pd.read_csv(file,header=None,error_bad_lines=False) #In windows and python3 always pass file object not the path directly in pd.read_csv
#print(df)
df = df.rename(columns={0: 'col'})
#print(df)
df = pd.DataFrame(df.col.str.split(' ',1).tolist(), columns = ['col1','col2']).T.reset_index(drop=True)
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
print(df)
appended_data.append(df)
if l==benign_freq_dir:
df['class']=0
else:
df['class']=1
#for l in appended_data:
# print(l)
# print(type(l))
appended_data= pd.concat(appended_data, axis=0,sort=False)
edit:
output for:
for dfx in appended_data:
print(dfx.head(2).to_dict())

