How can i add a list or a numpy array as a column to a Dask dataframe? When i try with the regular pandas syntax df['x']=x it gives me a TypeError: Column assignment doesn't support type list error.
2 Answers
You can add a pandas series:
df["new_col"] = pd.Series(my_list, index=index_matching_df_index)
The issue is that the index is extremely important so dask can understand how to partition the data. The size of each partition in a dask dataframe is not always known, so you cannot assign by position.
1 Comment
Ghost
Yep, i actually solved the issue by just casting the list into a dask array, and then worked flawless, but i think i'm gonna rollback on switchnig from pandas to dask.
I finally solved it just casting the list into a dask array with dask.array.from_array(), which i think it's the most direct way.
1 Comment
Michael Delgado
this works if you know the exact locations of the dask dataframe partitions. but you couldn't use this if you had previously changed the partitions, e.g.
df = dask.dataframe.from_pandas({'A': np.random.random(size=1000)}); df = df[df.A > 0.3];. Now you can't assign based on index directly. much of the dask.dataframe design is structured around working with unknown partition sizes.