11

I have a Pandas DataFrame that I'm creating row-by-row (I know, I know, it's not Pandorable/Pythonic..). I'm creating elements using .loc like so

output.loc[row_id, col_id]

and I'd like to set this value to an empty list, [].

output.loc[row_id, col_id] = []

Unfortunately, I get an error saying the size of my keys and values do not match (Pandas thinks I'm trying to set values with not to an iterable).

Is there a way to do this?

Thanks!

0

3 Answers 3

11

You can use pd.at instead:

df = pd.DataFrame()
df['B'] = [1, 2, 3]
df['A'] = None
df.at[1, 'A'] = np.array([1, 2, 3])

When you use pd.loc, pandas thinks you are interacting with a set of rows. So if you try to assign an array using pd.loc, pandas will try to match each element of an array with a corresponding element accessed by pd.loc, hence the error.

Sign up to request clarification or add additional context in comments.

Comments

8

You need to make sure two things:

  1. there is precisely one entry for that loc,
  2. the column has dtype object (actually, on testing this seems not to be an issue).

A hacky way to do this is to use a Series with []:

In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [12]: df.loc[[0], 'A'] = pd.Series([[]])

In [13]: df
Out[13]:
    A  B
0  []  2
1   3  4

pandas doesn't really want you use [] as elements because it's usually not so efficient and makes aggregations more complicated (and un-cythonisable).


In general you don't want to build up DataFrames cell-by-cell, there is (almost?) always a better way.

10 Comments

Hmm, interesting. Why'd you put 0 in a list in "df.loc[[0], 'A']"?
So assignment is done to a Series rather than a single entry/value, that's the "trick"/hack here!
"In general you don't want to build up DataFrames cell-by-cell, there is (almost?) always a better way." That depends on your use case. I use a DataFrame as the fundamental data structure in an app I've written, and that sets dataframes cell-by-cell all the time. This is not an issue in an interacive app since here, processing time is not the bottleneck.
@Thriveth "In general" :)
Isn't it a bit of an unnecessary hindrance that the only solution is "hacky"? Not criticising the answer, more pandas itself. I get the feeling there's is often a logic of "we would never need to do this way, so no one else should either," when pandas is involved. Yet there absolutely are reasons why someone would need to do things this way, the fact that someone is asking a question with several positive votes is proof of that.
|
4

The answer by MishaTeplitskiy works when the index label is 0. More generally, if you want to assign an array x to an element of a DataFrame df with row r and column c, you can use:

df.loc[[r], c] = pd.Series([x], index = [r])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.