1

I'm new to Pandas and am working with a multi-index data set of the form (made from groupby):

Name 
    Year 
        Month 
             Day 
                DataA   DataB   SpeciesName   SpeciesValue
                  A       B         Name1        Value1
                  A       B         Name2        Value2
                  A       B         Name3        Value3

For every group (unique Name, Year, Month, Day) only the final two columns have a distinct value the rest of the columns are identical. I want to make each group contain a single row. The row will have the SpeciesName value as the column title and the SpeciesValue value as the entry. For instance, the result of the group above should be:

Name 
    Year 
        Month 
             Day 
                DataA     DataB     Name1     Name2     Name3 
                  A         B       Value1    Value2    Value3

How would I go about this? Iterate through the dataframe or groupby object and create a new dataframe with the structure I want or is there a better way?

3
  • maybe you can try df.set_index('SpeciesName').unstack('SpeciesName') Commented Aug 7, 2017 at 21:01
  • Blake, is our row index MultiIndex or your column index? Commented Aug 8, 2017 at 6:46
  • @ScottBoston the rows are multiindexed Commented Aug 8, 2017 at 17:40

1 Answer 1

1

Okay, use set_index and unstack then reset_index:

df = pd.DataFrame({'Name':['Blake']*3,'Year':[2017]*3,
                  'Month':[1]*3,
                  'Day':[15]*3,
                  'DataA':['A']*3,
                  'DataB':['B']*3,
                  'SpeciesName':['Name1','Name2','Name3'],
                  'SpeciesValue':['Value1','Value2','Value3']})

df = df.set_index(['Name','Year','Month','Day'])

df

Sample input dataframe:

                     DataA DataB SpeciesName SpeciesValue
Name  Year Month Day                                     
Blake 2017 1     15      A     B       Name1       Value1
                 15      A     B       Name2       Value2
                 15      A     B       Name3       Value3

Now, let's reshape the dataframe:

df_out = df.set_index(['DataA','DataB','SpeciesName'],append=True)['SpeciesValue']\
  .unstack()\
  .reset_index(level=[-1,-2])

print(df_out)

Output:

SpeciesName          DataA DataB   Name1   Name2   Name3
Name  Year Month Day                                    
Blake 2017 1     15      A     B  Value1  Value2  Value3
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I could needed the data not to be under the SpeciesName like it is in your output. However your answer got me looking at some previously missed Pandas functions that I was able to use. I'll post what I did and you can let me know what you think. Thanks again for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.