*see edits below
I have a dataframe that contains 6 columns and I am using pandas and numpy to edit and work with the data.
id calv1 calv2 calv3 calv4
1 2006-08-29 2007-08-29 2008-08-29 2009-08-29
2 NaT NaT NaT NaT
3 2006-08-29 NaT NaT NaT
4 2006-08-29 2007-08-29 2010-08-29 NaT
5 2006-08-29 2013-08-29 NaT NaT
I want to create another column that counts the number of "calv" that occur for each id.
id calv1 calv2 calv3 calv4 no_calv
1 2006-08-29 2007-08-29 2008-08-29 2009-08-29 4
2 NaT NaT NaT NaT 0
3 2006-08-29 NaT NaT NaT 1
4 2006-08-29 2007-08-29 2010-08-29 NaT 3
5 2006-08-29 2013-08-29 NaT NaT 2
Here is my last attempt:
nat = np.datetime64('NaT')
df.loc[
(df["calv1"] == nat) & (df["calv2"] == nat) &
(df["calv3"] == nat) & (df["calv4"] == nat),
"no_calv"] = 0
#1 calvings
df.loc[
(df["calv1"] != nat) & (df["calv2"] == nat) &
(df["calv3"] == nat) & (df["calv4"] == nat),
"no_calv"] = 1
#2 calvings
df.loc[
(df["calv1"] != nat) & (df["calv2"] != nat) &
(df["calv3"] == nat) & (df["calv4"] == nat),
"no_calv"] = 2
#3 calvings
df.loc[
(df["calv1"] != nat) & (df["calv2"] != nat) &
(df["calv3"] != nat) & (df["calv4"] == nat),
"no_calv"] = 3
#4 or more calvings
df.loc[
(df["calv1"] != nat) & (df["calv2"] != nat) &
(df["calv3"] != nat) & (df["calv4"] != nat),
"no_calv"] = 4
But the result is that the whole "no_calv" column is 4.0
I previously tried things like
..
(df["calv1"] != "NaT")
..
And
..
(df["calv1"] != pd.nat)
..
And the result was always 4.0 for the whole column or just NaN.
Any tips and tricks for a new python user?
*Edit: I got a great answer for just counting the sum but I realize now that I also want to take into an account if there are missing values in between other values (see row 6):
id calv1 calv2 calv3 calv4 no_calv
1 2006-08-29 2007-08-29 2008-08-29 2009-08-29 4
2 NaT NaT NaT NaT 0
3 2006-08-29 NaT NaT NaT 1
4 2006-08-29 2007-08-29 2010-08-29 NaT 3
5 2006-08-29 2013-08-29 NaT NaT 2
6 2006-08-29 NaT 2013-08-29 2013-08-292 NaN #or some other value
This is why I was trying to be very clear with the criteria in my original example.