2

I have the following DF:

| Fecha      | Partido                 | Equipo  |  xG  |  xGA |
|------------|-------------------------|---------|------|------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 |
| 2022-05-01 | Leon - América 2:0      | América | 1.70 | 0.35 |

I want to create three new columns based on the Partido column where the first team goes to a new column named Home, the second team to a column named Visitor and the score to a column named Score.

Desired Output:

| Fecha      | Partido                 | Equipo  |  xG  |  xGA | Home    | Away       | Score |
|------------|-------------------------|---------|------|------|-------- |------------|-------|
| 2022-05-01 | América - Cruz Azul 0:0 | América | 1.53 | 0.45 | América | Cruz Azul  | 0:0   |
| 2022-05-01 | Leon - América 2:0      | América | 1.70 | 0.35 | Leon    | América    | 2:0   |

I have tried splitting with delimiter but since some teams have two words in their names it doesn't work.

3 Answers 3

3

It is quite simple using str.extract and a regex:

regex = r'([^-]+)\s*-\s*([^-]+) (\d+:\d+)'
df[['Home', 'Away', 'Score']] = df['Partido'].str.extract(regex)

output:

        Fecha                  Partido   Equipo    xG   xGA      Home       Away Score
0  2022-05-01  América - Cruz Azul 0:0  América  1.53  0.45  América   Cruz Azul   0:0
1  2022-05-01       Leon - América 2:0  América  1.70  0.35     Leon     América   2:0

regex demo

If you don't want to modify the original DataFrame, you can also use named capturing groups to directly set the column names:

regex = r'(?P<Home>[^-]+)\s*-\s*(?P<Away>[^-]+) (?P<Score>\d+:\d+)'
df2 = df['Partido'].str.extract(regex)

#        Home       Away Score
# 0  América   Cruz Azul   0:0
# 1     Leon     América   2:0

# OR
df2 = df.join(df['Partido'].str.extract(regex))

# same a first output
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I will check out the regex documentation.
3

You might use rsplit to get teams and score then split to get home and away following way

import pandas as pd
df = pd.DataFrame({'Partido':['América - Cruz Azul 0:0','Leon - América 2:0']})
df[['Teams', 'Score']] = df.Partido.str.rsplit(' ', 1, expand=True)
df[['Home', 'Away']] = df.pop('Teams').str.split(' - ', expand=True)
print(df)

output

                   Partido Score     Home       Away
0  América - Cruz Azul 0:0   0:0  América  Cruz Azul
1       Leon - América 2:0   2:0     Leon    América

Note: I simplified input df for brevity sake. 1 in rsplit mean split once, i.e. at rightmost space. .pop does remove and return selected column, I use it as Teams is only temporary.

Comments

1

One option with a split (regex):


regex = r"(\d+:\d+)|\s+-\s+"
outcome = (df.Partido
             .str.split(regex, expand = True)
             .dropna(how ='all', axis = 1)
             .iloc[:, :-1]
           )

df[['Home', 'Away', 'Score']] = outcome

df

        Fecha                  Partido   Equipo    xG   xGA     Home        Away Score
0  2022-05-01  América - Cruz Azul 0:0  América  1.53  0.45  América  Cruz Azul    0:0
1  2022-05-01       Leon - América 2:0  América  1.70  0.35     Leon    América    2:0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.