1

I've seen other posts about this but I'm running into an issue trying to follow the solutions. I am trying to split a column of scores (as strings) that are listed like this:

1-0
2-3
0-3
...

The code I'm trying to use:

df[['Home G', 'Away G']] = df['Score'].str.split('-', expand=True)

Error I am getting:

ValueError: Columns must be same length as key

Every game has a score though so the column length should match up? One thought I had is the 0's are giving some weird none values or something like that?

5
  • 1
    I have tried your code, defining df as df = pd.DataFrame({'Score': ['1-0', '2-3', '0-3']}) and it works for me. Commented Oct 27, 2020 at 14:55
  • Perhaps one of the rows doesn't have a '-' character? Try the solutions in this post. Commented Oct 27, 2020 at 14:57
  • Make sure df[~df['Score'].str.contains('-')] is an empty DataFrame Commented Oct 27, 2020 at 15:02
  • 1
    @CollinHeist I think not having a "-" character should not be an issue. See, for example: df = pd.DataFrame({'Score': ['1-0', '2-3', '0-3', np.NaN, '32', 3]}) and then df.Score.str.split('-', expand=True) (which returns 2 columns). But having multiple "-" characters could be problematic if you don't specify how many splits to make. Commented Oct 27, 2020 at 15:10
  • @LorenaGil That would require that I manually type in the scores of every single game as the season goes one and it not a very practical option in my case due to time and space it will require Commented Oct 27, 2020 at 16:41

3 Answers 3

2

This most likely happens if you have more than 1 possible split in a string. For example, perhaps you have a value somewhere like:

"1-2-3"

So, the expansion in this case would return 3 columns, but you would be trying to assign them to 2 columns ('Home G', 'Away G').

To fix it, specify explicitly the number of splits you should perform on each string to 1 by using the n argument, as explained in the Pandas documentation:

df[['Home G', 'Away G']] = df['Score'].str.split(pat='-', n=1, expand=True)

By default, n=-1, which means "split as many times as possible". By setting it to 1, you only split once.

EDIT

An alternative solution, if you are unsure of the number or type of hyphens or other symbols, is to extract with regex the two groups of numbers from each string. For example:

df[['Home G', 'Away G']] = pd.DataFrame(df['Score'].str.findall("([0-9]+)").tolist(), index=df.index)

So, for data that looks like

0   12‒0
1   2–3
2   0–3

You will end up with a df like

    Score   Home G  Away G
0   12‒0    12      0
1   2–3     2       3
2   0–3     0       3
Sign up to request clarification or add additional context in comments.

8 Comments

This still resulted in the same error. I can see all the values of the column, the table isn't very big, and all of the scores are in the same format and length so I'm not sure what else may be causing the issue.
@kr419 so when you just apply df['Score'].str.split("-", n=1), does every list returned only have 2 elements?
When I just apply df['Score'].str.split("-", n=1) it returns the score just like in the DF except each is now a list like this: [1-0] [2-3] [0-3]
df.Score.dtype says Object. type(df.Score[0]) says String
@kr419 is it possible that your "hyphen" is actually an en-dash (–) or a figure-dash (‒)? Those are different symbols and would not be picked up by splitting on the common hyphen ("-"). Perhaps that's why the option you suggested below with just taking the 1st and 3rd element of the string works.
|
0

Seems like your data needs some cleaning. If I were you, I would consider running some checks to see where the problem lets. Seems like you will either hit a situation where there are too many -s or no -s in your rows. I would run the following

df['check'] = [len(i) for i in df['Score'].str.findall(r'(-)')]
df[df['check] != 1]

The code calculates the number of - you have in each row, and flags out any row where - isn't 1. Hope this helps flag out your issues.

1 Comment

When I run this it returns all 57 rows again. I can see all 57 rows though and none of them are missing the -
0

Got it working using this:

df['Home G'] = 0
df['Away G'] = 0
for index,row in df.iterrows():
    df['Home G'][index] = row['Score'][0]
    df['Away G'][index] = row['Score'][2]

Though I'm sure there is still a better way to do it.

1 Comment

Please see the alternative solution I added to my answer, inspired by this solution and avoiding having to deal with the hyphens or relying on score length.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.