How to create new DataFrame columns with extracted data from existing columns

Question

Hi guys I have the following DataFrame:

  Index    Numbering           Description
    1          A            Agri. and Forest
    2          1                  Agri.
    3         1.1              -----------
    4         1.2              -----------
    5         1.3              -----------
    6          2                  Forest
    7         2.1              -----------
    8         2.3              -----------
    9         2.4              -----------
   10          B               Manufacturing
   11          3                  Autos
   12         3.1              -----------
   13         3.2              -----------
   14         3.3              -----------

I want to create two new columns with values extracted from the existing columns. I want to achieve the following:

   Index     Numbering       Description         Letter     Number
    1           A           Agri. and Forest        A       
    2           1                 Agri.             A         1
    3          1.1             -----------          A         1
    4          1.2             -----------          A         1
    5          1.3             -----------          A         1
    6           2                 Forest            A         2
    7          2.1             -----------          A         2
    8          2.3             -----------          A         2
    9          2.4             -----------          A         2
   10           B              Manufacturing        B
   11           3                 Autos             B         3
   12          3.1             -----------          B         3
   13          3.2             -----------          B         3
   14          3.3             -----------          B         3

Your ideas are much appreciated.

What is the logic for adding A, B, 1, 2, or 3?is it based on some other columns?or cells? — Rebin
– Rebin, Commented Apr 22, 2019 at 15:44
The original data is from excel cells; the desired columns data should come from the existing column "Numbering" . In the new column "Letter" A should be included in each row until B is met in column "Numbering". Then starts B. With respect to numbers in "Numbering" regardless if it is 1 or 1.2 or 1.3 , in the new column should appear only 1, i.e the first digit. — Martin Yordanov Georgiev
– Martin Yordanov Georgiev, Commented Apr 22, 2019 at 15:58

Rebin · Accepted Answer · 2019-04-22 18:53:40Z

1

I solved the problem in this way.(assumed you can have data as CSV in excel)

import pandas as pd
import math
letter=''
data1 = pd.read_csv('C:/d1', sep=',', header=None,names=['C1','C2'])

df1=pd.DataFrame(data1)
dfNew=pd.DataFrame(columns=['C1','C2','C3','C4'])

(rows,columns)=df1.shape

for index in range(rows):
    if(df1.iat[index,0].isalpha()):
        letter=df1.iat[index,0]
        number=''
    else:
        number=math.floor(float(df1.iat[index,0]))
    tempRow=[df1.iat[index,0],df1.iat[index,1],letter,number]
    dfNew.loc[len(dfNew)]=tempRow

print(dfNew)

RESULT

     C1                C2 C3 C4
0     A  Agri. and Forest  A
1     1             Agri.  A  1
2   1.1       -----------  A  1
3   1.2       -----------  A  1
4   1.3       -----------  A  1
5     2            Forest  A  2
6   2.1       -----------  A  2
7   2.3       -----------  A  2
8   2.4       -----------  A  2
9     B     Manufacturing  B
10    3             Autos  B  3
11  3.1       -----------  B  3
12  3.2       -----------  B  3
13  3.3       -----------  B  3

another way

I am not sure why the previous one is not working for you. But this is a slight change that may work. check it out.

import numpy as np
import pandas as pd
import math
letter=''
data1 = pd.read_csv('C:/random/d1', sep=',', header=None,names=['C1','C2'])

df1=pd.DataFrame(data1)
dfNew=pd.DataFrame(columns=['C1','C2','C3','C4'])


(rows,columns)=df1.shape

for index in range(rows):
    try:
        c1=float(df1.iat[index,0])
    except:
        c1=df1.iat[index,0]

    if(isinstance(c1,float) ):
        number=math.floor(c1)
    else:
        letter=df1.iat[index,0]
        number=''

    tempRow=[df1.iat[index,0],df1.iat[index,1],letter,number]
    dfNew.loc[len(dfNew)]=tempRow

print()
print(dfNew)

RESULT (the same)

     C1                C2 C3 C4
0     A  Agri. and Forest  A
1     1             Agri.  A  1
2   1.1       -----------  A  1
3   1.2       -----------  A  1
4   1.3       -----------  A  1
5     2            Forest  A  2
6   2.1       -----------  A  2
7   2.3       -----------  A  2
8   2.4       -----------  A  2
9     B     Manufacturing  B
10    3             Autos  B  3
11  3.1       -----------  B  3
12  3.2       -----------  B  3
13  3.3       -----------  B  3
Press any key to continue . . .

edited Apr 22, 2019 at 18:53

answered Apr 22, 2019 at 16:59

Rebin

5261 gold badge6 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Martin Yordanov Georgiev Over a year ago

Thanks Rebin, I think the code will work but I am struggling with the format of the data in column "C1" from the original data I imported. When I run your code i get the following error: " 'int' object has no attribute 'isalpha' ". It shows right at the beginning of the if statement where "isalpha" is. I checked the dtype of "C1" and it shows as dtype('O'). I converted it to string but then when I run the code again I get this: could not convert string to float: ' - this shows at the line after the else argument. Any suggestions what the type of my data should be in column "C1"?

Collectives™ on Stack Overflow

How to create new DataFrame columns with extracted data from existing columns

1 Answer 1

RESULT

another way

RESULT (the same)

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

RESULT

another way

RESULT (the same)

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related