2

So I have two rather large excel file that I have converted into two dataframes (df for the current week & df2 for the previous week.). There are a total of 128 rows that are identical in both of the dataframes, so I've used created a new variable:

onlyWon = df.loc[df['Sales stage'] == "Won"]

Thereafter, I am trying to create a new dataframe that only contains the values in df2 that match the Sales number in the onlyWon dataframe. For example, if I were to do this with only one item the code would be:

df2.loc[df2['Sales No'] == "B3M-RB-03"])

Which works for one column, but when I try to for example iterate over the onlyWon dataframe and append the data to a new dataframe, I run into errors.

Examples on how I want it to work:

DF2:

+------------------+----------+-------------+-----------+
|     Customer     | Sales No | Sales Stage | Deal Size |
+------------------+----------+-------------+-----------+
| Stackoverflow    | A1       | Identified  |       100 |
| Guido van Rossum | B2       | Lost        |      1000 |
+------------------+----------+-------------+-----------+

OnlyWon:

+---------------+----------+-------------+-----------+
|   Customer    | Sales No | Sales Stage | Deal Size |
+---------------+----------+-------------+-----------+
| Stackoverflow | A1       | WON         |       100 |
+---------------+----------+-------------+-----------+

New dataframe:

+---------------+----------+-------------+-----------+
|   Customer    | Sales No | Sales Stage | Deal Size |
+---------------+----------+-------------+-----------+
| Stackoverflow | A1       | Identified  |       100 |
+---------------+----------+-------------+-----------+

What I tried to do

Declaring a new empty dataframe (df3) that contains all the same headers, but is empty.

Creating a list out of all the 'Sales No':

onlyWonSales = []
for salesNo in onlyWon['Sales No']:
    onlyWonSales.append(salesNo)

Then looping over the list and appending to the new dataframe:

for item in onlyWonSales:
    df3 = df3.append(df2.loc[df2['Sales No'] == item)

This adds a lot of duplicates and doesn't work (even though it doesn't create any errors (The onlyWonSales list is around 1000 and the df3 is around 4000).

3
  • can you post the error? Commented Apr 23, 2020 at 19:53
  • @komatiraju032, what I tried to do was to create a list out of all the sales numbers in the OnlyWon dataframe by doing: ` onlyWonSales = [] for SalesNo in onlyWon['Sales No']: onlyWonSales.append(SalesNo) ` This work by adding all the sales numbers in a list. (I get 1000 when doing len(onlyWonSales). Then I try to do: ` for item in onlyWonSales: df3 = df3.append(df2.loc[df2['Sales No'] == item]) ` Which causes a lot of duplicates and stuff to be added (around 4000). Commented Apr 23, 2020 at 19:54
  • @komatiraju032 I've updated my post to include what I did with better formatting. Commented Apr 23, 2020 at 20:07

2 Answers 2

1

Like this:

In [150]: new = pd.merge(df2, onlywon, on=['Sales No'], suffixes=('', '_y'))

In [153]: new.drop(list(new.filter(regex='_y$')), axis=1, inplace=True)                                                                                                                                     

In [154]: new                                                                                                                                                                                               
Out[154]: 
        Customer Sales No Sales Stage  Deal Size
0  Stackoverflow       A1  Identified        100
Sign up to request clarification or add additional context in comments.

1 Comment

Just run new = new.drop_duplicates() and check the shape.
0

Leave onlyWon then do a query

 onlyWon = df.loc[df['Sales stage'] == "Won"]

 sales_no_won = onlyWon['Sales No']
 reults = df2.query('`Sales No` in @sales_no_won').copy()

4 Comments

That produces a key error: raise KeyError(f"{not_found} not in index")
Right, it was missing the suffixes, sorry
Unfortunately that didn't work... Still has a lot of duplicated 'Sales No' in the Results dataframe. The OnlyWon dataframe has only unique values in the Sales No columns, so it should only capture those who have the matching Sales No in the DF2.
So sorry to read that, try with the edit using query method, I think for this case is cleaner.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.