been trying to crack this for a while, but stuck now. This is my code
l=list()
column_name=[col for col in df.columns if 'SalesPerson' in col]
filtereddf=pd.DataFrame(columns=['Item','SerialNo','Location','SalesPerson01','SalesPerson02',SalesPerson03',SalesPerson04',SalesPerson05',SalesPerson06','PredictedSales01','PredictedSales02','PredictedSales03','PredictedSales04','PredictedSales05','PredictedSales06']
for i,r in df.iterrows():
if len(r['Name'].split(';'))>1:
for x in r['Name'].split(';'):
for y in column_name:
if x in r[y]:
number_is=y[-2:]
filtereddf.at[i,'SerialNo']=r['SerialNo']
filtereddf.at[i,'Location']=r['Location']
filtereddf.at[i,y]=r[y]
filtereddf.at[i,'Item']=r['Item']
filtereddf.at[i,f'PredictedSales{number_is}']=r[f'PredictedSales{number_is}']
#The below statement however prints the values correctly. But I want to filter the values and use in a dataframe
#print(r['SerialNo'],r['Location'],r[f'SalesPerson{number_is}'],r[f'PredictedSales{number_is}]',r['Definition'])
l.append(filtereddf)
elif for y in column_name:
if r['Name'] in r[y]:
number_is=y[-2:]
filtereddf.at[i,'SerialNo']=r['SerialNo']
filtereddf.at[i,'Location']=r['Location']
filtereddf.at[i,y]=r[y]
filtereddf.at[i,'Item']=r['Item']
filtereddf.at[i,f'PredictedSales{number_is}']=r[f'PredictedSales{number_is}']
#The below statement however prints the values correctly. But I want to filter the values and use in a dataframe
#print(r['SerialNo'],r['Location'],r[f'SalesPerson{number_is}'],r[f'PredictedSales{number_is}]',r['Definition'])
l.append(filtereddf)
finaldf=pd.concat(l,ignore_index=True)
It eventually throws an error
MemoryError: Unable to allocate 9.18 GiB for an array with shape (1, 1231543895) and data type object
Basically I want to extract SalesPersonNN and corresponding PredicatedSalesNN from the main dataframe df
sampled dataset is (Actual csv file is almost 100k entries)
Item Name SerialNo Location SalesPerson01 SalesPerson02 SalesPerson03 SalesPerson04 SalesPerson05 SalesPerson06 PredictedSales01 PredictedSales02 PredictedSales03 PredictedSales04 PredictedSales05 PredictedSales06
0 TV Joe;Mary;Philip 11111 NY Tom Julie Joe Sara Mary Philip 90 80 30 98 99 100
1 WashingMachine Mike 22222 NJ Tom Julie Joe Mike Mary Philip 80 70 40 74 88 42
2 Dishwasher Tony;Sue 33333 NC Margaret Tony William Brian Sue Bert 58 49 39 59 78 89
3 Microwave Bill;Jeff;Mary 44444 PA Elmo Bill Jeff Mary Chris Kevin 80 70 90 56 92 59
4 Printer Keith;Joe 55555 DE Keith Clark Ed Matt Martha Joe 87 94 59 48 74 89
And I want the output dataframe to look like
tem Name SerialNo Location SalesPerson01 SalesPerson02 SalesPerson03 SalesPerson04 SalesPerson05 SalesPerson06 PredictedSales01 PredictedSales02 PredictedSales03 PredictedSales04 PredictedSales05 PredictedSales06
0 TV Joe;Mary;Philip 11111 NY NaN NaN Joe NaN Mary Philip NaN NaN 30.0 NaN 99.0 100.0
1 WashingMachine Mike 22222 NJ NaN NaN NaN Mike NaN NaN NaN NaN NaN 74.0 NaN NaN
2 Dishwasher Tony;Sue 33333 NC NaN Tony NaN NaN Sue NaN NaN 49.0 NaN NaN 78.0 NaN
3 Microwave Bill;Jeff;Mary 44444 PA NaN Bill Jeff Mary NaN NaN NaN 70.0 90.0 56.0 NaN NaN
4 Printer Keith;Joe 55555 DE Keith NaN NaN NaN NaN Joe 87.0 NaN NaN NaN NaN 89.0
I am not sure if my approach using dataframe.at is correct or if any pointers as to what i can use to efficiently filter only those columns values which matches the value in column Name
pd.read_clipboardto put it nicely in a dataframe.