Create column with some values from another columns - Conditional

Question

I have a column Values that contain the category examples: New Va,P Va,B... I need to create one column for each category and your respective value

       Date  Column1 Total        Type Values
0       NaN      NaN   NaN       Type1    5.1
1       NaN  Column2   Sum       Type1 New Va
2   04/2019        2   NaN       Type1    NaN
3   05/2019        2   NaN       Type1    NaN
4   06/2019        2     2       Type1     14
5   07/2019        4     4       Type1     16
6       NaN      NaN   NaN  Unnamed: 4    NaN
7       NaN  Column2   Sum  Unnamed: 4   P Va
8   04/2019        2   NaN  Unnamed: 4    NaN
9   05/2019        2   NaN  Unnamed: 4    NaN
10  06/2019        2     2  Unnamed: 4     10
11  07/2019        4     4  Unnamed: 4     15
12      NaN      NaN   NaN  Unnamed: 5    NaN
13      NaN  Column2   Sum  Unnamed: 5      B
14  04/2019        2   NaN  Unnamed: 5    NaN
15  05/2019        2   NaN  Unnamed: 5    NaN
16  06/2019        2     2  Unnamed: 5      8
17  07/2019        4     4  Unnamed: 5      7
18      NaN      NaN   NaN       Type2    4.9

Considering that NAN Data values from Date column will be removed, the expected result is:

       Date  Column1 Total        Type Values New Va   P Va  B
0       NaN      NaN   NaN       Type1    5.1   
1       NaN  Column2   Sum       Type1      N
2   04/2019        2   NaN       Type1    NaN   0
3   05/2019        2   NaN       Type1    NaN   0
4   06/2019        2     2       Type1     14   14
5   07/2019        4     4       Type1     16   16
6       NaN      NaN   NaN  Unnamed: 4    NaN
7       NaN  Column2   Sum  Unnamed: 4      P
8   04/2019        2   NaN  Unnamed: 4    NaN       0
9   05/2019        2   NaN  Unnamed: 4    NaN       0
10  06/2019        2     2  Unnamed: 4     10       10
11  07/2019        4     4  Unnamed: 4     15       15
12      NaN      NaN   NaN  Unnamed: 5    NaN
13      NaN  Column2   Sum  Unnamed: 5      B            
14  04/2019        2   NaN  Unnamed: 5    NaN              0
15  05/2019        2   NaN  Unnamed: 5    NaN              0
16  06/2019        2     2  Unnamed: 5      8              8
17  07/2019        4     4  Unnamed: 5      7              7
18      NaN      NaN   NaN       Type2    4.9

After that, I will group by the values from Date to keep the values New Pa, P Va, and B in the same row. I'm trying to use the for to create new columns identifying the

 df['New Va'] = np.where(df['Values'].str.contains('New Va'),'N',np.NaN)

However, all lines differents from P and B are NaN, and I don't have the numbers like example above

piRSquared · Accepted Answer · 2021-03-01 07:53:17Z

2

import re  # Not strictly necessary, but it might speed things up for lots of data

pat = re.compile("^[a-zA-Z\s]*$")            # compile is what might speed things up
v = df.Values[df.Column1.notna()].fillna(0) 
a = ~v.str.match(pat).fillna(False)          # mask of things that don't match
keys = pd.unique(v[~a])                      # get unique matches
fill = dict.fromkeys(keys, '')
d = pd.get_dummies(v.mask(a).ffill())[a]
new = d.mul(pd.to_numeric(v[a]), axis=0).where(d == 1, '')[keys]

df.join(new).fillna(fill)

       Date  Column1 Total        Type  Values New Va P Va  B
0       NaN      NaN   NaN       Type1     5.1               
1       NaN  Column2   Sum       Type1  New Va               
2   04/2019        2   NaN       Type1     NaN      0        
3   05/2019        2   NaN       Type1     NaN      0        
4   06/2019        2     2       Type1      14     14        
5   07/2019        4     4       Type1      16     16        
6       NaN      NaN   NaN  Unnamed: 4     NaN               
7       NaN  Column2   Sum  Unnamed: 4    P Va               
8   04/2019        2   NaN  Unnamed: 4     NaN           0   
9   05/2019        2   NaN  Unnamed: 4     NaN           0   
10  06/2019        2     2  Unnamed: 4      10          10   
11  07/2019        4     4  Unnamed: 4      15          15   
12      NaN      NaN   NaN  Unnamed: 5     NaN               
13      NaN  Column2   Sum  Unnamed: 5       B               
14  04/2019        2   NaN  Unnamed: 5     NaN              0
15  05/2019        2   NaN  Unnamed: 5     NaN              0
16  06/2019        2     2  Unnamed: 5       8              8
17  07/2019        4     4  Unnamed: 5       7              7
18      NaN      NaN   NaN       Type2     4.9

edited Mar 1, 2021 at 7:53

answered Feb 26, 2021 at 18:42

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Shubham Sharma Over a year ago

Nice answer @piRSquared :)

Twwister8889 Over a year ago

Your alternate approach is almost there. How can I verify if has alpha and with spaces? I'm trying to use keys = pd.unique([s for s in s if str(s).isalpha() or str(s).isspace()]) But this not working...

piRSquared Over a year ago

@Twwister8889 include it in your example so that I understand

Twwister8889 Over a year ago

@piRSquared I edited, included the 'New Va' as a category My code I have: keys = pd.unique([s for s in s if str(s).replace(' ','').isalpha()]) -- I don't know if is the best way, but its working But the problem now, its here: a = ~v.str.isalpha().fillna(False) Here contains string with space

piRSquared Over a year ago

@Twwister8889 I understand now. I can include that later when I get home

|

Shubham Sharma · Accepted Answer · 2021-03-01 15:03:18Z

Let us try:

m = df['Values'].str.contains(r'(?i)^[A-Z\s]+$', na=False)
c, b = list(df.loc[m, 'Values']), m.cumsum()

for _, v in df['Values'].groupby(b):
    if v.iat[0] in c:
        s = v.iloc[1:].fillna(0)
        df.loc[s.index, v.iat[0]] = s

df[c] = df[c].mask(df['Date'].isna()).fillna('')

Details:

Create a boolean mask with str.contains specifying the condition where Values contain Categories like New Va, P Va, B:

>>> m
0     False
1      True
2     False
3     False
4     False
5     False
6     False
7      True
8     False
9     False
10    False
11    False
12    False
13     True
14    False
15    False
16    False
17    False
18    False
Name: Values, dtype: bool

Identify the blocks starting with category in the Values columns:

>>> b

0     0
1     1
2     1
3     1
4     1
5     1
6     1
7     2
8     2
9     2
10    2
11    2
12    2
13    3
14    3
15    3
16    3
17    3
18    3
Name: Values, dtype: int64

Group the column Values on this blocks of elements and for each group add/update the category column in the dataframe with the values that follows the category in each block, finally mask the values in these newly added column where Date is NaN:

>>> df

       Date  Column1 Total        Type  Values New Va P Va  B
0       NaN      NaN   NaN       Type1     5.1               
1       NaN  Column2   Sum       Type1  New Va               
2   04/2019        2   NaN       Type1     NaN      0        
3   05/2019        2   NaN       Type1     NaN      0        
4   06/2019        2     2       Type1      14     14        
5   07/2019        4     4       Type1      16     16        
6       NaN      NaN   NaN  Unnamed: 4     NaN               
7       NaN  Column2   Sum  Unnamed: 4    P Va               
8   04/2019        2   NaN  Unnamed: 4     NaN           0   
9   05/2019        2   NaN  Unnamed: 4     NaN           0   
10  06/2019        2     2  Unnamed: 4      10          10   
11  07/2019        4     4  Unnamed: 4      15          15   
12      NaN      NaN   NaN  Unnamed: 5     NaN               
13      NaN  Column2   Sum  Unnamed: 5       B               
14  04/2019        2   NaN  Unnamed: 5     NaN              0
15  05/2019        2   NaN  Unnamed: 5     NaN              0
16  06/2019        2     2  Unnamed: 5       8              8
17  07/2019        4     4  Unnamed: 5       7              7
18      NaN      NaN   NaN       Type2     4.9

The problem is: I will have one more row for other months for the N, P, and B category, so the result shows just the last occurs this values. In my dataset, the first occurs for N 0,0,14 and 16 is missing only the last values is showing, How can I solve this? Thanks

Collectives™ on Stack Overflow

Create column with some values from another columns - Conditional

2 Answers 2

6 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related