0

I have a dataframe that I have pivotted and looks like this. Basically, this just shows the number of item count per day.

MY OUTPUT
date        item1   item4   item6
12/12/17    10      1       13
12/13/18    22      5       32
12/14/19    9       3       22

but the final output requested from me is to show all the items, even if there were results for that day or not, it should show on the table.

EXPECTED OUTPUT
date        item1   item2   item3   item4   item5   item6
12/12/17    10                      1               13
12/13/18    22                      5               32
12/14/19    9                       3               22

is there a way with pandas to allow me to predefine the headers? which will then match to my actual results?

what I tried doing was to create a separate mysql table, then query and transform to dataframe that table which basically contains the list of items and the sequence. And then I left merged the item list with the actual data. Now I have a table with the actual data and the item list. But when I try to pivot, only the columns with values are seen in the pivot.

SAMPLE SOURCE DATA
item    date        serial_no
item1   12/12/17    001
item1   12/12/17    002
item4   12/13/17    003
item6   12/14/17    004
item4   12/13/17    005
item6   12/14/17    006
item1   12/12/17    007
item1   12/14/17    008

and how I pivot is by:
pivot_df = df.pivot_table(
    index = ['date'], 
    values = [serial_no], 
    columns = ['items'], 
    aggfunc = [len]
)
1
  • 2
    Can you share the data you pivoted to get this? Commented Dec 12, 2017 at 16:57

2 Answers 2

4

The simplest way to do this is using reindex -

df = df.set_index('date')

i = int(re.search('\d+', df.columns[0]).group(0)) 
j = int(re.search('\d+', df.columns[-1]).group(0))    

df.reindex(columns=['item{}'.format(i) for i in range(i, j + 1)], fill_value='')

          item1 item2 item3  item4 item5  item6
date                                           
12/12/17     10                  1           13
12/13/18     22                  5           32
12/14/19      9                  3           22

Although it may be possible to do this during the pivot step itself.

Sign up to request clarification or add additional context in comments.

Comments

2
In [41]: n = pd.to_numeric(df.columns.str.extract('(\d+)', expand=False), errors='coerce')

In [42]: idx = 'item' + pd.Series(np.arange(n.min(), n.max()+1).astype(int)).astype(str)

In [43]: df.loc[:, ~df.columns.str.startswith('item')] \
           .join(df.filter(regex='^item') \
           .reindex(idx, axis=1))
Out[43]:
       date  item1  item2  item3  item4  item5  item6
0  12/12/17     10    NaN    NaN      1    NaN     13
1  12/13/18     22    NaN    NaN      5    NaN     32
2  12/14/19      9    NaN    NaN      3    NaN     22

NOTE: it might be easier to manipulate your data before pivoting to have the same result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.