0

I have the following pandas dataframe:

enter image description here

and would like to remove the duplicate rows.

For example:

(Atlanta Falcons/Jacksonville Jaguars is found as Jacksonville Jaguars/Atlanta Falcons).

What is the best way to do so?

Thanks!

1
  • Hello! It'll help people to help you if you post your data in a reproducible format, not as a screenshot or image. There is a great post on Stack Overflow about How to make good reproducible pandas examples that you should check out and then edit your post based off of it. Commented Nov 30, 2021 at 2:10

2 Answers 2

3

The code that will do the trick for you is this one:

df["team_a"] = np.minimum(df['team1'], df['team2'])
df["team_b"] = np.maximum(df['team1'], df['team2'])

df.drop_duplicates(["season","week","team_a","team_b"],inplace= True)
df.drop(columns= ["team_a","team_b"],inplace= True)

Before doing this, please check your data, because when team1 and team2 are inverted, the columns team1_score and team2_score are not being inverted, so it may be confusing after you remove one of the rows.

Sign up to request clarification or add additional context in comments.

Comments

0

Because OP did not provide a reproducible dataset:

import pandas as pd

# dataset where the 1st and 5th observations are team A vs team F:
df = pd.DataFrame({
    "season": [2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021],
    "week": [12, 12, 12, 12, 12, 13, 13, 13, 13, 13],
    "team1": ["A", "B", "C", "D", "F", "A", "B", "C", "D", "F"],
    "team2": ["F", "G", "H", "I", "A", "F", "G", "H", "I", "A"]
})

df
    season  week    team1   team2
0     2021    12        A       F
1     2021    12        B       G
2     2021    12        C       H
3     2021    12        D       I
4     2021    12        F       A
5     2021    13        A       F
6     2021    13        B       G
7     2021    13        C       H
8     2021    13        D       I
9     2021    13        F       A

# solution:
df[[df["team1"].str.contains(c) == False for c in df["team2"].tolist()][0]]
    season  week    team1   team2
0     2021    12        A       F
1     2021    12        B       G
2     2021    12        C       H
3     2021    12        D       I
4     2021    13        A       F
5     2021    13        B       G
6     2021    13        C       H
7     2021    13        D       I

4 Comments

sorry for not posting a reproducible dataset, this is read in as a df using pd.read_csv for all NFL data, I tried just running the last line and it doesn't seem to do the trick?
I don't know what to tell you. I just edited my code example to reflect multiple weeks (12 and 13) and my code still works.
Great, thank you!
No problem! If my answer worked for you please mark it with the checkmark (if not, no worries).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.