35

I am webscraping some data from a few websites, and using pandas to modify it.

On the first few chunks of data it worked well, but later I get this error message:

Traceback(most recent call last):
  File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2326, in __setitem__ self._setitem_array(key,value)
  File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2350, in _setitem_array
raise ValueError("Columns must be same length as key')  ValueError: Columns must be same length as key

My code is here:

df2 = pd.DataFrame(datatable, columns = cols)
df2[['STATUS_ID_1','STATUS_ID_2']] = df2['STATUS'].str.split(n=1, expand=True)

My data looks like below:

                  STATUS
2       Landed   8:33 AM
3       Landed   9:37 AM
..         ...       ...
316    Delayed   5:00 PM
341    Delayed   4:32 PM
..         ...       ...
397    Delayed   5:23 PM
..         ...       ...

[240 rows x 2 columns]
1

4 Answers 4

28

You need a bit modify solution, because sometimes it return 2 and sometimes only one column:

df2 = pd.DataFrame({'STATUS':['Estimated 3:17 PM','Delayed 3:00 PM']})


df3 = df2['STATUS'].str.split(n=1, expand=True)
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns]
print (df3)
  STATUS_ID1 STATUS_ID2
0  Estimated    3:17 PM
1    Delayed    3:00 PM

df2 = df2.join(df3)
print (df2)
              STATUS STATUS_ID1 STATUS_ID2
0  Estimated 3:17 PM  Estimated    3:17 PM
1    Delayed 3:00 PM    Delayed    3:00 PM

Another possible data - all data have no whitespaces and solution working too:

df2 = pd.DataFrame({'STATUS':['Canceled','Canceled']})

and solution return:

print (df2)
     STATUS STATUS_ID1
0  Canceled   Canceled
1  Canceled   Canceled

All together:

df3 = df2['STATUS'].str.split(n=1, expand=True)
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns]
df2 = df2.join(df3)
Sign up to request clarification or add additional context in comments.

Comments

7

To solve this error, check the shape of the object you're trying to assign the df columns (using np.shape). The second (or the last) dimension must match the number of columns you're trying to assign to. For example, if you try to assign a 2-column numpy array to 3 columns, you'll see this error.

A general workaround (for case 1 and case 2 below) is to cast the object you're trying to assign to a DataFrame and join() it to df, i.e. instead of (1), use (2).

df[cols] = vals   # (1)
df = df.join(vals) if isinstance(vals, pd.DataFrame) else df.join(pd.DataFrame(vals))  # (2)

If you're trying to replace values in an existing column and got this error (case 3(a) below), convert the object to list and assign.

df[cols] = vals.values.tolist()

If you have duplicate columns (case 3(b) below), then there's no easy fix. You'll have to make the dimensions match manually.



This error occurs in 3 cases:

Case 1: When you try to assign a list-like object (e.g. lists, tuples, sets, numpy arrays, and pandas Series) to a list of DataFrame column(s) as new arrays1 but the number of columns doesn't match the second (or last) dimension (found using np.shape) of the list-like object. So the following reproduces this error:

df = pd.DataFrame({'A': [0, 1]})
cols, vals = ['B'], [[2], [4, 5]]
df[cols] = vals # number of columns is 1 but the list has shape (2,)

Note that if the columns are not given as list, pandas Series, numpy array or Pandas Index, this error won't occur. So the following doesn't reproduce the error:

df[('B',)] = vals # the column is given as a tuple

One interesting edge case occurs when the list-like object is multi-dimensional (but not a numpy array). In that case, under the hood, the object is cast to a pandas DataFrame first and is checked if its last dimension matches the number of columns. This produces the following interesting case:

# the error occurs below because pd.DataFrame(vals1) has shape (2, 2) and len(['B']) != 2
vals1 = [[[2], [3]], [[4], [5]]]
df[cols] = vals1

# no error below because pd.DataFrame(vals2) has shape (2, 1) and len(['B']) == 1
vals2 = [[[[2], [3]]], [[[4], [5]]]]
df[cols] = vals2

Case 2: When you try to assign a DataFrame to a list (or pandas Series or numpy array or pandas Index) of columns but the respective numbers of columns don't match. This case is what caused the error in the OP. The following reproduce the error:

df = pd.DataFrame({'A': [0, 1]})
df[['B']] = pd.DataFrame([[2, 3], [4]]) # a 2-column df is trying to be assigned to a single column

df[['B', 'C']] = pd.DataFrame([[2], [4]]) # a single column df is trying to be assigned to 2 columns

Case 3: When you try to replace the values of existing column(s) by a DataFrame (or a list-like object) whose number of columns doesn't match the number of columns it's replacing. So the following reproduce the error:

# case 3(a)
df1 = pd.DataFrame({'A': [0, 1]})
df1['A'] = pd.DataFrame([[2, 3], [4, 5]]) # df1 has a single column named 'A' but a 2-column-df is trying to be assigned

# case 3(b): duplicate column names matter too
df2 = pd.DataFrame([[0, 1], [2, 3]], columns=['A','A'])
df2['A'] = pd.DataFrame([[2], [4]]) # df2 has 2 columns named 'A' but a single column df is being assigned

1: df.loc[:, cols] = vals may overwrite data inplace, so this won't produce the error but will create columns of NaN values.

Comments

1

I stumbled upon this error while trying to modify an empty DataFrame like that:

data["code"] = data.apply(lambda r: "DE" if r["code"] == "D" else r["code"], axis=1)

In order to fix it I added this precondition:

if not data.empty:
   ...

1 Comment

It's strange that I cannot reproduce with a simple case. But this error has been throwed in my product code.
0

Also occurred to me when the value to assign is a sparse matrix. The shape was OK when printed but the type was not right. I had to make a toarray() in order to change it into a numpy array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.