1

I have a dataframe with columns ['ID', 'DATE', 'VALUE']. The way that the data I am sourcing comes in, I have many duplicate IDs, each of which has a duplicate price--so, for instance, the frame will come in with

ID  Date   Value
a  1/1/17    2
a  1/2/17    3
a  1/3/17    4
b  1/1/17    5
b  1/2/17    6
b  1/2/17    7

I have made a frame where the date is the index, and unique IDs are the columns, via

ID = list(set(df['ID']))
DATE = list(set(df['DATE']))
newdf = pd.DataFrame(columns = ID, index = DATE).sort()

I now want to retrieve the Value from df, and place it so that newdf[DATE][ID] matches up with the those indices from df, and I can't figure out how to cast those without some onerous for loops--is there a better way?

1 Answer 1

3

We using combine_first+pivot_table

newdf.combine_first(pd.pivot_table(df,index='Date',columns='ID',values='Value',aggfunc='sum'))
Out[442]: 
          a     b
1/1/17  2.0   5.0
1/2/17  3.0  13.0
1/3/17  4.0   NaN
Sign up to request clarification or add additional context in comments.

4 Comments

This works brilliantly! I think everything else I can get by merging, but this is perfect. Thank you!
@MichaelSchweitzer YW~ :-) happy coding
Nice answer Wen. But why use combine_first? It looks like we get the desired result from the pivot_table().
@andrew_reece I think this is just coincidence, since he want to lookup the value for newdf, it all depend on how newdf created..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.