0

Hi guys I have the following DataFrame:

  Index    Numbering           Description
    1          A            Agri. and Forest
    2          1                  Agri.
    3         1.1              -----------
    4         1.2              -----------
    5         1.3              -----------
    6          2                  Forest
    7         2.1              -----------
    8         2.3              -----------
    9         2.4              -----------
   10          B               Manufacturing
   11          3                  Autos
   12         3.1              -----------
   13         3.2              -----------
   14         3.3              -----------

I want to create two new columns with values extracted from the existing columns. I want to achieve the following:

   Index     Numbering       Description         Letter     Number
    1           A           Agri. and Forest        A       
    2           1                 Agri.             A         1
    3          1.1             -----------          A         1
    4          1.2             -----------          A         1
    5          1.3             -----------          A         1
    6           2                 Forest            A         2
    7          2.1             -----------          A         2
    8          2.3             -----------          A         2
    9          2.4             -----------          A         2
   10           B              Manufacturing        B
   11           3                 Autos             B         3
   12          3.1             -----------          B         3
   13          3.2             -----------          B         3
   14          3.3             -----------          B         3

Your ideas are much appreciated.

2
  • What is the logic for adding A, B, 1, 2, or 3?is it based on some other columns?or cells? Commented Apr 22, 2019 at 15:44
  • The original data is from excel cells; the desired columns data should come from the existing column "Numbering" . In the new column "Letter" A should be included in each row until B is met in column "Numbering". Then starts B. With respect to numbers in "Numbering" regardless if it is 1 or 1.2 or 1.3 , in the new column should appear only 1, i.e the first digit. Commented Apr 22, 2019 at 15:58

1 Answer 1

1

I solved the problem in this way.(assumed you can have data as CSV in excel)

import pandas as pd
import math
letter=''
data1 = pd.read_csv('C:/d1', sep=',', header=None,names=['C1','C2'])

df1=pd.DataFrame(data1)
dfNew=pd.DataFrame(columns=['C1','C2','C3','C4'])

(rows,columns)=df1.shape

for index in range(rows):
    if(df1.iat[index,0].isalpha()):
        letter=df1.iat[index,0]
        number=''
    else:
        number=math.floor(float(df1.iat[index,0]))
    tempRow=[df1.iat[index,0],df1.iat[index,1],letter,number]
    dfNew.loc[len(dfNew)]=tempRow

print(dfNew)

RESULT

     C1                C2 C3 C4
0     A  Agri. and Forest  A
1     1             Agri.  A  1
2   1.1       -----------  A  1
3   1.2       -----------  A  1
4   1.3       -----------  A  1
5     2            Forest  A  2
6   2.1       -----------  A  2
7   2.3       -----------  A  2
8   2.4       -----------  A  2
9     B     Manufacturing  B
10    3             Autos  B  3
11  3.1       -----------  B  3
12  3.2       -----------  B  3
13  3.3       -----------  B  3

another way

I am not sure why the previous one is not working for you. But this is a slight change that may work. check it out.

import numpy as np
import pandas as pd
import math
letter=''
data1 = pd.read_csv('C:/random/d1', sep=',', header=None,names=['C1','C2'])

df1=pd.DataFrame(data1)
dfNew=pd.DataFrame(columns=['C1','C2','C3','C4'])


(rows,columns)=df1.shape

for index in range(rows):
    try:
        c1=float(df1.iat[index,0])
    except:
        c1=df1.iat[index,0]

    if(isinstance(c1,float) ):
        number=math.floor(c1)
    else:
        letter=df1.iat[index,0]
        number=''

    tempRow=[df1.iat[index,0],df1.iat[index,1],letter,number]
    dfNew.loc[len(dfNew)]=tempRow

print()
print(dfNew)

RESULT (the same)

     C1                C2 C3 C4
0     A  Agri. and Forest  A
1     1             Agri.  A  1
2   1.1       -----------  A  1
3   1.2       -----------  A  1
4   1.3       -----------  A  1
5     2            Forest  A  2
6   2.1       -----------  A  2
7   2.3       -----------  A  2
8   2.4       -----------  A  2
9     B     Manufacturing  B
10    3             Autos  B  3
11  3.1       -----------  B  3
12  3.2       -----------  B  3
13  3.3       -----------  B  3
Press any key to continue . . .
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Rebin, I think the code will work but I am struggling with the format of the data in column "C1" from the original data I imported. When I run your code i get the following error: " 'int' object has no attribute 'isalpha' ". It shows right at the beginning of the if statement where "isalpha" is. I checked the dtype of "C1" and it shows as dtype('O'). I converted it to string but then when I run the code again I get this: could not convert string to float: ' - this shows at the line after the else argument. Any suggestions what the type of my data should be in column "C1"?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.