2

How can i add a list or a numpy array as a column to a Dask dataframe? When i try with the regular pandas syntax df['x']=x it gives me a TypeError: Column assignment doesn't support type list error.

0

2 Answers 2

2

You can add a pandas series:

df["new_col"] = pd.Series(my_list, index=index_matching_df_index)

The issue is that the index is extremely important so dask can understand how to partition the data. The size of each partition in a dask dataframe is not always known, so you cannot assign by position.

Sign up to request clarification or add additional context in comments.

1 Comment

Yep, i actually solved the issue by just casting the list into a dask array, and then worked flawless, but i think i'm gonna rollback on switchnig from pandas to dask.
-1

I finally solved it just casting the list into a dask array with dask.array.from_array(), which i think it's the most direct way.

1 Comment

this works if you know the exact locations of the dask dataframe partitions. but you couldn't use this if you had previously changed the partitions, e.g. df = dask.dataframe.from_pandas({'A': np.random.random(size=1000)}); df = df[df.A > 0.3];. Now you can't assign based on index directly. much of the dask.dataframe design is structured around working with unknown partition sizes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.