Pandas : how do I create a column based on data from the same dataframe?

Question

I have a Pandas dataframe similar to

d = {'id': [1, 2, 2, 3], 'year': [2010, 2010,2011,2010], 'type' : ['A','B','B','A'], 'value': [20,2,8,3]}
df = pd.DataFrame(data = d)

That is

   id  year type  value
0   1  2010    A     20
1   2  2010    B      2
2   2  2011    B      8
3   3  2010    A      3

I want to add a new column which contains the value one year later for the same id and type, if it exists in the df (0 otherwise). That is, the expected result is

   id  year type  value  new_value
0   1  2010    A     20   0
1   2  2010    B      2   8
2   2  2011    B      8   0
3   3  2010    A      3   0

I cannot figure out a way of doing that (I have been experimenting mostly with apply). Any suggestions?

Does this answer your question? Pandas conditional creation of a series/dataframe column — bcosta12
– bcosta12, Commented Jan 31, 2020 at 14:53
I guess not, since I am not able to state a priori logical conditions, but the logical conditions depend on the specific row. That is, for each row I have to find another row with the same id and type, if it exists. — k88074
– k88074, Commented Jan 31, 2020 at 14:59

Quang Hoang · Accepted Answer · 2020-01-31 14:58:55Z

3

You can try merge:

(df.merge(df.assign(year=df['year']-1)[['id','year','value']],
          on=['id','year'],
          how='left',
          suffixes=['','_y'])
   .fillna(0)
)

Output:

   id  year type  value  value_y
0   1  2010    A     20      0.0
1   2  2010    B      2      8.0
2   2  2011    B      8      0.0
3   3  2010    A      3      0.0

answered Jan 31, 2020 at 14:58

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mykola Zotko · Accepted Answer · 2020-01-31 15:30:38Z

0

You can use the method shift:

# first, you need to sort by id, type and year (already sorted in example)
df = df.sort_values(['id', 'type', 'year'])
cols = ['id', 'type']

mask = (df[cols].shift(-1, fill_value=False) == df[cols]).all(axis=1)
df['value'].shift(-1, fill_value=0).where(mask, 0)

Output:

0    0
1    8
2    0
3    0
Name: value, dtype: int64

answered Jan 31, 2020 at 15:30

Mykola Zotko

18.2k7 gold badges88 silver badges91 bronze badges

Comments

cph_sto · Accepted Answer · 2020-01-31 16:06:13Z

Here is another solution involving dictionary.

# Creating a key column
df['key'] = df[['id','year','type']].astype(str).sum(axis=1)
print(df)
       id  year type  value     key
    0   1  2010    A     20  12010A
    1   2  2010    B      2  22010B
    2   2  2011    B      8  22011B
    3   3  2010    A      3  32010A

Now, creating a dictionary.

# Creating a dictionary
dict_of_columns = dict(zip(df.key, df.value))
print(dict_of_columns)
    {'12010A': 20, '22010B': 2, '22011B': 8, '32010A': 3}

Now, we are creating new year column obtained by adding 1 to every year and creating the corresponding key 'new_value' and creating new keys.

df['next_year']=df['year'] + 1
df['new_value'] = df[['id','next_year','type']].astype(str).sum(axis=1)
print(df)
       id  year type  value     key  next_year new_value
    0   1  2010    A     20  12010A       2011    12011A
    1   2  2010    B      2  22010B       2011    22011B
    2   2  2011    B      8  22011B       2012    22012B
    3   3  2010    A      3  32010A       2011    32011A

Finally, mapping the new key - new_value to the dictionary we have created and dropping the created columns.

df['new_value'] = df['new_value'].map(dict_of_columns).fillna(0)
df = df.drop(['key','next_year'],axis=1)
print(df)
       id  year type  value  new_value
    0   1  2010    A     20        0.0
    1   2  2010    B      2        8.0
    2   2  2011    B      8        0.0
    3   3  2010    A      3        0.0

Collectives™ on Stack Overflow

Pandas : how do I create a column based on data from the same dataframe?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related