how to set custom order of headers in pandas dataframe

Question

I have a dataframe that I have pivotted and looks like this. Basically, this just shows the number of item count per day.

MY OUTPUT
date        item1   item4   item6
12/12/17    10      1       13
12/13/18    22      5       32
12/14/19    9       3       22

but the final output requested from me is to show all the items, even if there were results for that day or not, it should show on the table.

EXPECTED OUTPUT
date        item1   item2   item3   item4   item5   item6
12/12/17    10                      1               13
12/13/18    22                      5               32
12/14/19    9                       3               22

is there a way with pandas to allow me to predefine the headers? which will then match to my actual results?

what I tried doing was to create a separate mysql table, then query and transform to dataframe that table which basically contains the list of items and the sequence. And then I left merged the item list with the actual data. Now I have a table with the actual data and the item list. But when I try to pivot, only the columns with values are seen in the pivot.

SAMPLE SOURCE DATA
item    date        serial_no
item1   12/12/17    001
item1   12/12/17    002
item4   12/13/17    003
item6   12/14/17    004
item4   12/13/17    005
item6   12/14/17    006
item1   12/12/17    007
item1   12/14/17    008

and how I pivot is by:
pivot_df = df.pivot_table(
    index = ['date'], 
    values = [serial_no], 
    columns = ['items'], 
    aggfunc = [len]
)

Can you share the data you pivoted to get this?

cs95
– cs95

2017-12-12 16:57:52 +00:00
Commented Dec 12, 2017 at 16:57 — cs95
– cs95, Commented Dec 12, 2017 at 16:57

cs95 · Accepted Answer · 2017-12-12 17:02:58Z

4

The simplest way to do this is using reindex -

df = df.set_index('date')

i = int(re.search('\d+', df.columns[0]).group(0)) 
j = int(re.search('\d+', df.columns[-1]).group(0))    

df.reindex(columns=['item{}'.format(i) for i in range(i, j + 1)], fill_value='')

          item1 item2 item3  item4 item5  item6
date                                           
12/12/17     10                  1           13
12/13/18     22                  5           32
12/14/19      9                  3           22

Although it may be possible to do this during the pivot step itself.

answered Dec 12, 2017 at 17:02

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MaxU - stand with Ukraine · Accepted Answer · 2017-12-12 17:03:30Z

2

In [41]: n = pd.to_numeric(df.columns.str.extract('(\d+)', expand=False), errors='coerce')

In [42]: idx = 'item' + pd.Series(np.arange(n.min(), n.max()+1).astype(int)).astype(str)

In [43]: df.loc[:, ~df.columns.str.startswith('item')] \
           .join(df.filter(regex='^item') \
           .reindex(idx, axis=1))
Out[43]:
       date  item1  item2  item3  item4  item5  item6
0  12/12/17     10    NaN    NaN      1    NaN     13
1  12/13/18     22    NaN    NaN      5    NaN     32
2  12/14/19      9    NaN    NaN      3    NaN     22

NOTE: it might be easier to manipulate your data before pivoting to have the same result

answered Dec 12, 2017 at 17:03

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Collectives™ on Stack Overflow

how to set custom order of headers in pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related