4

I have a Pandas dataframe similar to

d = {'id': [1, 2, 2, 3], 'year': [2010, 2010,2011,2010], 'type' : ['A','B','B','A'], 'value': [20,2,8,3]}
df = pd.DataFrame(data = d)

That is

   id  year type  value
0   1  2010    A     20
1   2  2010    B      2
2   2  2011    B      8
3   3  2010    A      3

I want to add a new column which contains the value one year later for the same id and type, if it exists in the df (0 otherwise). That is, the expected result is

   id  year type  value  new_value
0   1  2010    A     20   0
1   2  2010    B      2   8
2   2  2011    B      8   0
3   3  2010    A      3   0

I cannot figure out a way of doing that (I have been experimenting mostly with apply). Any suggestions?

2
  • Does this answer your question? Pandas conditional creation of a series/dataframe column Commented Jan 31, 2020 at 14:53
  • I guess not, since I am not able to state a priori logical conditions, but the logical conditions depend on the specific row. That is, for each row I have to find another row with the same id and type, if it exists. Commented Jan 31, 2020 at 14:59

3 Answers 3

3

You can try merge:

(df.merge(df.assign(year=df['year']-1)[['id','year','value']],
          on=['id','year'],
          how='left',
          suffixes=['','_y'])
   .fillna(0)
)

Output:

   id  year type  value  value_y
0   1  2010    A     20      0.0
1   2  2010    B      2      8.0
2   2  2011    B      8      0.0
3   3  2010    A      3      0.0
Sign up to request clarification or add additional context in comments.

Comments

0

You can use the method shift:

# first, you need to sort by id, type and year (already sorted in example)
df = df.sort_values(['id', 'type', 'year'])
cols = ['id', 'type']

mask = (df[cols].shift(-1, fill_value=False) == df[cols]).all(axis=1)
df['value'].shift(-1, fill_value=0).where(mask, 0)

Output:

0    0
1    8
2    0
3    0
Name: value, dtype: int64

Comments

0

Here is another solution involving dictionary.

# Creating a key column
df['key'] = df[['id','year','type']].astype(str).sum(axis=1)
print(df)
       id  year type  value     key
    0   1  2010    A     20  12010A
    1   2  2010    B      2  22010B
    2   2  2011    B      8  22011B
    3   3  2010    A      3  32010A

Now, creating a dictionary.

# Creating a dictionary
dict_of_columns = dict(zip(df.key, df.value))
print(dict_of_columns)
    {'12010A': 20, '22010B': 2, '22011B': 8, '32010A': 3}

Now, we are creating new year column obtained by adding 1 to every year and creating the corresponding key 'new_value' and creating new keys.

df['next_year']=df['year'] + 1
df['new_value'] = df[['id','next_year','type']].astype(str).sum(axis=1)
print(df)
       id  year type  value     key  next_year new_value
    0   1  2010    A     20  12010A       2011    12011A
    1   2  2010    B      2  22010B       2011    22011B
    2   2  2011    B      8  22011B       2012    22012B
    3   3  2010    A      3  32010A       2011    32011A

Finally, mapping the new key - new_value to the dictionary we have created and dropping the created columns.

df['new_value'] = df['new_value'].map(dict_of_columns).fillna(0)
df = df.drop(['key','next_year'],axis=1)
print(df)
       id  year type  value  new_value
    0   1  2010    A     20        0.0
    1   2  2010    B      2        8.0
    2   2  2011    B      8        0.0
    3   3  2010    A      3        0.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.