-1

Master dataframe filled with a specific match's players and statistics. 34 columns and variable number of rows.
Column "Player" has full names

Player Goals Assists
Dominic Calvert-Lewin 1 1
Beto 2 0
Jarrad Branthwaite 0 1
Jack Harrison 0 0

Snippet dataframe automatically created, displays only players with a referee note (Yelllow cards, Red Cards). Consists of 3 columns. The problem is that First Name is either a full name, initial letter or blank.

First Name Last Name Cause
D Calvert-Lewin Foul
Beto Time Wasting
Jack Harrison

What i want to achieve:
Match "First Name" probably using startswith along "Last Name" using contains, with Player in master df.
If both colums match add a column to Snippet df with fullnames.
Expected dataframe:

Player Cause
Dominic Calvert-Lewin Foul
Beto Time Wasting
Jack Harrison

So far i only had one to one matching:

pat1 = '('+'|'.join(Snippet['Last Name'])+')'
Master["Yellow"] = Master['Player'].str.extract(pat1)[0].map(Snippet.set_index('Last Name')['Cause'].to_dict()).fillna('')
7
  • Sorry, I can't follow the description of the problem at all. It certainly doesn't help that it's full of sentence fragments rather than complete sentences. To be clear: are the "master dataframe" and the "snippet dataframe" both inputs to the problem? For the exact shown inputs, exactly what should the output be? As is, I can't even tell whether you expect the result to be a DataFrame, or just what. Commented Apr 19, 2024 at 14:38
  • Are there players with the same last name in your data? If not, just match the last name. Commented Apr 19, 2024 at 14:49
  • This problem doesn't have a generic solution, as explained in the link below. You'd have to tell us what assumptions you are willing to make about the data. kalzumeus.com/2010/06/17/… Commented Apr 19, 2024 at 14:50
  • @Joooeey Yes, there are often duplicates in last names. Commented Apr 19, 2024 at 14:54
  • Okay, can you at least assume that no last names in your data contain spaces, that all the names are written from left to right (e.g. no Arabic script), and that the last name is never written before the first name(s)? What other assumptions can you make? Commented Apr 19, 2024 at 15:11

1 Answer 1

0

If you have these two dataframes:

df_master

                  Player  Goals  Assists
0  Dominic Calvert-Lewin      1        1
1                   Beto      2        0
2     Jarrad Branthwaite      0        1
3          Jack Harrison      0        0


df_snippet

  First Name      Last Name         Cause
0          D  Calvert-Lewin          Foul
1        NaN           Beto  Time Wasting
2       Jack       Harrison           NaN
3      Hello          World           NaN

Then you can do:

df_snippet["First Name"] = df_snippet["First Name"].fillna("")

out = []
for _, row in df2.iterrows():
    m1 = df_master["Player"].str.startswith(row["First Name"])
    m2 = df_master["Player"].str.endswith(row["Last Name"])

    m = m1 & m2

    if m.any():
        out.append(df_master.loc[m.idxmax(), "Player"])
    else:
        out.append(None)

df_snippet["Player"] = out
print(df_snippet)

Prints:

  First Name      Last Name         Cause                 Player
0          D  Calvert-Lewin          Foul  Dominic Calvert-Lewin
1                      Beto  Time Wasting                   Beto
2       Jack       Harrison           NaN          Jack Harrison
3      Hello          World           NaN                   None
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.