6

How would I go about renaming the index on a dask dataframe? I tried it like so

df.index.name = 'foo'

but rechecking df.index.name shows it still being whatever it was previously.

2 Answers 2

6

This does not seem like an efficient way to do it, so I wouldn't be surprised if there is something more direct.

d.index.name starts off as 'foo';

def f(df, name):
    df.index.name = name
    return df

d.map_partitions(f, 'pow')

The output now has index name of 'pow'. If this is done with the threaded scheduler, I think you also change the index name of d in-place (in which case you don't really need the output of map_partitions).

Sign up to request clarification or add additional context in comments.

3 Comments

Adding: this strategy can also be applied to rename a Dask Series, just by removing the .index from f function.
This seems off to me. This generates dask delayed tasks for something that should obviously be immediate. github.com/dask/dask/issues/4950
In dask-world, when to use compute() is up to the user. It may be best to combine with other operations.
4

A bit late, but the following functions:

    import dask.dataframe as dd
    import pandas as pd
    df = pd.DataFrame().assign(s=[1, 2], o=[3, 4], p=[5, 6]).set_index("si")
    ddf = dd.from_pandas(df, npartitions=2)
    ddf.index = ddf.index.rename("si2")

I hope this can help someone else out!

1 Comment

Just like for the OP, this didn't actually change the name of the index when I tried it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.