0

been trying to crack this for a while, but stuck now. This is my code

l=list()
column_name=[col for col in df.columns if 'SalesPerson' in col]
filtereddf=pd.DataFrame(columns=['Item','SerialNo','Location','SalesPerson01','SalesPerson02',SalesPerson03',SalesPerson04',SalesPerson05',SalesPerson06','PredictedSales01','PredictedSales02','PredictedSales03','PredictedSales04','PredictedSales05','PredictedSales06']
for i,r in df.iterrows():
       if len(r['Name'].split(';'))>1:
            for x in r['Name'].split(';'):
                for y in column_name:
                    if x in r[y]:
                        number_is=y[-2:]
                        filtereddf.at[i,'SerialNo']=r['SerialNo']
                        filtereddf.at[i,'Location']=r['Location']
                        filtereddf.at[i,y]=r[y]
                        filtereddf.at[i,'Item']=r['Item']
                        filtereddf.at[i,f'PredictedSales{number_is}']=r[f'PredictedSales{number_is}']
#The below statement however prints the values correctly. But I want to filter the values and use in a dataframe
#print(r['SerialNo'],r['Location'],r[f'SalesPerson{number_is}'],r[f'PredictedSales{number_is}]',r['Definition'])
                        l.append(filtereddf)
       elif for y in column_name:
            if r['Name'] in r[y]:                
                        number_is=y[-2:]
                        filtereddf.at[i,'SerialNo']=r['SerialNo']
                        filtereddf.at[i,'Location']=r['Location']
                        filtereddf.at[i,y]=r[y]
                        filtereddf.at[i,'Item']=r['Item']
                        filtereddf.at[i,f'PredictedSales{number_is}']=r[f'PredictedSales{number_is}']
#The below statement however prints the values correctly. But I want to filter the values and use in a dataframe
#print(r['SerialNo'],r['Location'],r[f'SalesPerson{number_is}'],r[f'PredictedSales{number_is}]',r['Definition'])
                        l.append(filtereddf)
finaldf=pd.concat(l,ignore_index=True)

It eventually throws an error

MemoryError: Unable to allocate 9.18 GiB for an array with shape (1, 1231543895) and data type object

Basically I want to extract SalesPersonNN and corresponding PredicatedSalesNN from the main dataframe df

sampled dataset is (Actual csv file is almost 100k entries)

Item    Name    SerialNo    Location    SalesPerson01   SalesPerson02   SalesPerson03   SalesPerson04   SalesPerson05   SalesPerson06   PredictedSales01    PredictedSales02    PredictedSales03    PredictedSales04    PredictedSales05    PredictedSales06
0   TV  Joe;Mary;Philip 11111   NY  Tom Julie   Joe Sara    Mary    Philip  90  80  30  98  99  100
1   WashingMachine  Mike    22222   NJ  Tom Julie   Joe Mike    Mary    Philip  80  70  40  74  88  42
2   Dishwasher  Tony;Sue    33333   NC  Margaret    Tony    William Brian   Sue Bert    58  49  39  59  78  89
3   Microwave   Bill;Jeff;Mary  44444   PA  Elmo    Bill    Jeff    Mary    Chris   Kevin   80  70  90  56  92  59
4   Printer Keith;Joe   55555   DE  Keith   Clark   Ed  Matt    Martha  Joe 87  94  59  48  74  89

And I want the output dataframe to look like

tem Name    SerialNo    Location    SalesPerson01   SalesPerson02   SalesPerson03   SalesPerson04   SalesPerson05   SalesPerson06   PredictedSales01    PredictedSales02    PredictedSales03    PredictedSales04    PredictedSales05    PredictedSales06
0   TV  Joe;Mary;Philip 11111   NY  NaN NaN Joe NaN Mary    Philip  NaN NaN 30.0    NaN 99.0    100.0
1   WashingMachine  Mike    22222   NJ  NaN NaN NaN Mike    NaN NaN NaN NaN NaN 74.0    NaN NaN
2   Dishwasher  Tony;Sue    33333   NC  NaN Tony    NaN NaN Sue NaN NaN 49.0    NaN NaN 78.0    NaN
3   Microwave   Bill;Jeff;Mary  44444   PA  NaN Bill    Jeff    Mary    NaN NaN NaN 70.0    90.0    56.0    NaN NaN
4   Printer Keith;Joe   55555   DE  Keith   NaN NaN NaN NaN Joe 87.0    NaN NaN NaN NaN 89.0
​

I am not sure if my approach using dataframe.at is correct or if any pointers as to what i can use to efficiently filter only those columns values which matches the value in column Name

3
  • Will you add the sample and expected output dataframes as text, not as images? It's impossible to copy the text from the image. Commented Dec 22, 2021 at 23:08
  • @richardec - Sorry about that. Tried to paste as text, but the formatting is hard to read Commented Dec 22, 2021 at 23:21
  • It's actually perfect. As long as there aren't any spaces in any cells/columns, I can copy it and use pd.read_clipboard to put it nicely in a dataframe. Commented Dec 22, 2021 at 23:23

1 Answer 1

1

I would recommend changing from a column focused dataframe to a row focused dataframe. You can rewrite your dataset using melt:

df_person = df.loc[:,'SalesPerson01':'SalesPerson06']
df_sales = df.loc[:,'PredictedSales01':'PredictedSales06']
df_person = df_person.melt(ignore_index=False, value_name='SalesPerson')[['SalesPerson']]
PredictedSales = df_sales.melt(ignore_index=False, value_name='PredictedSales')[['PredictedSales']]
df_person['PredictedSales'] = PredictedSales

index_cols = ['Item','SerialNo', 'Location', 'SalesPerson']
df_person = df_person.reset_index().sort_values(index_cols).set_index(index_cols)

df_person will look like this:

Item            SerialNo    Location    SalesPerson PredictedSales
TV              11111       NY          Joe         30
                                        Julie       80
                                        Mary        99
                                        Philip      100
                                        Sara        98
                                        Tom         90
WashingMachine  22222       NJ          Joe         40
                                        Julie       70
                                        Mary        88
                                        Mike        74
                                        Philip      42
                                        Tom         80
...             ...         ...         ...         ...
Printer         55555       DE          Clark       94
                                        Ed          59
                                        Joe         89
                                        Keith       87
                                        Martha      74
                                        Matt        48

Now you only want the values from the names in you 'Name' column. Therefor we create a separate dataframe using explode:

df_names = df[['Name']].explode('Name').rename({'Name':'SalesPerson'}, axis=1)
df_names = df_names.reset_index().set_index(['Item','SerialNo', 'Location', 'SalesPerson'])

df_names will look something like this:

Item            SerialNo    Location    SalesPerson
TV              11111       NY          Joe
                                        Mary
                                        Philip
WashingMachine  22222       NJ          Mike
Dishwasher      33333       NC          Tony
                                        Sue
Microwave       44444       PA          Bill
                                        Jeff
                                        Mary
Printer         55555       DE          Keith
                                        Joe

Now you can simply merge your dataframes:

df_names.merge(df_person, left_index=True, right_index=True)

Now the PredictedSales are added to you df_names dataframe.

Hopefully this will run without errors. Please let me know 😀

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.