Can i combine contain and startswith in order to match two columns from one dataframe to another's master column?

Question

Master dataframe filled with a specific match's players and statistics. 34 columns and variable number of rows.
Column "Player" has full names

Player	Goals	Assists
Dominic Calvert-Lewin	1	1
Beto	2	0
Jarrad Branthwaite	0	1
Jack Harrison	0	0

Snippet dataframe automatically created, displays only players with a referee note (Yelllow cards, Red Cards). Consists of 3 columns. The problem is that First Name is either a full name, initial letter or blank.

First Name	Last Name	Cause
D	Calvert-Lewin	Foul
	Beto	Time Wasting
Jack	Harrison

What i want to achieve:
Match "First Name" probably using startswith along "Last Name" using contains, with Player in master df.
If both colums match add a column to Snippet df with fullnames.
Expected dataframe:

Player	Cause
Dominic Calvert-Lewin	Foul
Beto	Time Wasting
Jack Harrison

So far i only had one to one matching:

pat1 = '('+'|'.join(Snippet['Last Name'])+')'
Master["Yellow"] = Master['Player'].str.extract(pat1)[0].map(Snippet.set_index('Last Name')['Cause'].to_dict()).fillna('')

Sorry, I can't follow the description of the problem at all. It certainly doesn't help that it's full of sentence fragments rather than complete sentences. To be clear: are the "master dataframe" and the "snippet dataframe" both inputs to the problem? For the exact shown inputs, exactly what should the output be? As is, I can't even tell whether you expect the result to be a DataFrame, or just what. — Karl Knechtel
– Karl Knechtel, Commented Apr 19, 2024 at 14:38
Are there players with the same last name in your data? If not, just match the last name. — Joooeey
– Joooeey, Commented Apr 19, 2024 at 14:49
This problem doesn't have a generic solution, as explained in the link below. You'd have to tell us what assumptions you are willing to make about the data. kalzumeus.com/2010/06/17/… — Joooeey
– Joooeey, Commented Apr 19, 2024 at 14:50
Okay, can you at least assume that no last names in your data contain spaces, that all the names are written from left to right (e.g. no Arabic script), and that the last name is never written before the first name(s)? What other assumptions can you make? — Joooeey
– Joooeey, Commented Apr 19, 2024 at 15:11

Andrej Kesely · Accepted Answer · 2024-04-19 22:44:59Z

If you have these two dataframes:

df_master

                  Player  Goals  Assists
0  Dominic Calvert-Lewin      1        1
1                   Beto      2        0
2     Jarrad Branthwaite      0        1
3          Jack Harrison      0        0


df_snippet

  First Name      Last Name         Cause
0          D  Calvert-Lewin          Foul
1        NaN           Beto  Time Wasting
2       Jack       Harrison           NaN
3      Hello          World           NaN

Then you can do:

df_snippet["First Name"] = df_snippet["First Name"].fillna("")

out = []
for _, row in df2.iterrows():
    m1 = df_master["Player"].str.startswith(row["First Name"])
    m2 = df_master["Player"].str.endswith(row["Last Name"])

    m = m1 & m2

    if m.any():
        out.append(df_master.loc[m.idxmax(), "Player"])
    else:
        out.append(None)

df_snippet["Player"] = out
print(df_snippet)

Prints:

  First Name      Last Name         Cause                 Player
0          D  Calvert-Lewin          Foul  Dominic Calvert-Lewin
1                      Beto  Time Wasting                   Beto
2       Jack       Harrison           NaN          Jack Harrison
3      Hello          World           NaN                   None

Collectives™ on Stack Overflow

Can i combine contain and startswith in order to match two columns from one dataframe to another's master column?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related