3

I want to combine monthly data for multiple attributes into a single row consisting of data for the entire year. It is a similar problem to Combine multiple time-series rows into one row with Pandas. The solution in this question would work for me if the months of my years were complete, but for most of my data months are missing from individual years. If there is only data for December, the solution will fill the first column rather then the column 12.

For the example, I am only using January to June.

Here is an example of my data:

ex_df = pd.DataFrame({'year': [1966, 1966, 1966, 1967, 1967, 1967], 'month': [1, 2, 6, 2, 3, 4], 'A': [10, 5, 0, np.nan, 25, 0], 'B': [-100, -80, 100, -120, 0, 60], 'C': [-1, -18, -2, -11, 10, 6]})

Here is the optimal solution:

sol_df = pd.DataFrame({'year': [1966, 1967], 'A1': [10, np.nan], 'A2': [5, np.nan],'A3': [np.nan, 25],'A4': [np.nan, 0],'A5': [np.nan, np.nan],'A6': [0, np.nan],'B1': [-100, np.nan],  'B2': [-80, -120], 'B3': [np.nan, 0], 'B4': [np.nan, 60], 'B5': [np.nan, np.nan], 'B6': [100, np.nan],'C1': [-1, np.nan],  'C2': [-18, -11], 'C3': [np.nan, 10], 'C4': [np.nan, 6], 'C5': [np.nan, np.nan], 'C6': [-2, np.nan] })

A, B, C are attributes and the number corresponds to the month.

2 Answers 2

5

Use:

df = ex_df.set_index(['year','month']).unstack()
df.columns = ['{}{}'.format(x, y) for x, y in df.columns]
df = df.reset_index()
print (df)
   year    A1   A2    A3   A4   A6     B1     B2   B3    B4     B6   C1    C2  \
0  1966  10.0  5.0   NaN  NaN  0.0 -100.0  -80.0  NaN   NaN  100.0 -1.0 -18.0   
1  1967   NaN  NaN  25.0  0.0  NaN    NaN -120.0  0.0  60.0    NaN  NaN -11.0   

     C3   C4   C6  
0   NaN  NaN -2.0  

Explanation:

  1. First set_index and unstack for reshape
  2. Flatten Multiindex in columns with list comprehension
  3. Create column from index by reset_index
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the added explanation -- it makes the code easier for me to understand.
1
a = ex_df.melt(['year', 'month'])
a['Obsevation'] = a.variable + a.month.astype(str)
a.pivot_table('value', index='year', columns='Obsevation')

1 Comment

Thanks for this solution. It is simple and easy to follow.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.