Create a column with particular value in pandas DataFrame

Question

I have DataFrame with columns author (with name of author), hour(when author published the topic) and number_of_topics (how many topics each author published an hour). Here is an example:

  author hour number_of_topics
0      A  h01                1
1      B  h02                4
2      B  h04                2
3      C  h04                6
4      A  h05                8
5      C  h05                3

My goal is create six columns (for first six hours) and fill them with number of topics. I am tried using df.groupby to do this but did not succeed. Desired output:

  author h01 h02 h03 h04 h05 h06
0      A   1   0   0   0   8   0
1      B   0   4   0   2   0   0
2      C   0   0   0   6   3   0

Code to create my DataFrame:

import pandas as pd
df = pd.DataFrame({"author":["A","B", "B","C","A","C"],
                   "hour":["h01","h02","h04","h04","h05","h05"],
                   "number_of_topics":["1","4","2","6","8","3"]})
print(df)

df.pivot_table(columns=['hour'], index=['author'], values=['number_of_topics'], aggfunc='first', fill_value=0) — rafaelc
– rafaelc, Commented Aug 16, 2018 at 14:15

jezrael · Accepted Answer · 2018-08-16 14:18:11Z

1

Use pivot with reindex for add mising columns:

cols = ['h{:02d}'.format(x) for x in range(1, 7)]
df = (df.pivot('author','hour','number_of_topics')
        .fillna(0)
        .reindex(columns=cols, fill_value=0)
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
  author h01 h02  h03 h04 h05  h06
0      A   1   0    0   0   8    0
1      B   0   4    0   2   0    0
2      C   0   0    0   6   3    0

Or set_index with unstack:

cols = ['h{:02d}'.format(x) for x in range(1, 7)]
df = (df.set_index(['author','hour'])['number_of_topics']
        .unstack(fill_value=0)
        .reindex(columns=cols, fill_value=0)
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
  author h01 h02  h03 h04 h05  h06
0      A   1   0    0   0   8    0
1      B   0   4    0   2   0    0
2      C   0   0    0   6   3    0

answered Aug 16, 2018 at 14:18

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

rafaelc Over a year ago

Why removed duplicate?

jezrael Over a year ago

@RafaelC - Because reindex, check another answer.

ysearka · Accepted Answer · 2018-08-16 14:16:16Z

0

What you are looking for can be achieved through pivot function:

df.pivot(index = 'author',columns = 'hour',values = 'number_of_topics').fillna(0)

hour    h01     h02     h04     h05
author              
A       1       0       0       8
B       0       4       2       0
C       0       0       6       3

answered Aug 16, 2018 at 14:16

ysearka

3,8655 gold badges24 silver badges42 bronze badges

Collectives™ on Stack Overflow

Create a column with particular value in pandas DataFrame

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related