2

I have a data frame df like this,

    A           length
0   648702831   9
1    26533315   8
2         366   3
3   354701058   9
4    25708239   8
5       70554   5
6     1574512   7
7        3975   4

Now, I want to create a column based on some conditions like this,

if ['length] == 9 or ['length] == 5:
   then ['new_col'] = First 5 Characters of ['A']

else if ['length] == 8 or ['length] == 4:
   then ['new_col'] = "0" & First 4 Characters of ['A']

else if ['length] == 7 or ['length] == 3:
   then ['new_col'] = "00" & First 3 Characters of ['A']

else 
   ['new_col'] = ['A']

For above conditions, I created the following logic to check, (For a file with 10,000 rows, it takes a lot of time)

for i in df['length']:

    if i == 9 or i == 5:
        df['new_col'] = df['A'].astype(str).str[:5]
    elif i == 8 or i == 4:
        df['new_col'] = "0" + df['A'].astype(str).str[:4]

    elif i == 7 or i == 3:
        df['new_col'] = "00" + df['A'].astype(str).str[:3]

    else:
        df['new_col'] = df['A']

I get the following output,

    A          length   new_col
0   648702831   9      06487
1    26533315   8      02653
2         366   3      0366
3   354701058   9      03547
4     5708239   8      05708
5       70554   5      07055
6      1574512  7      01574
7         3975  4      03975

This is not I want and it seems to be working only for second condition that adds "0" in front when the length is 8 or 4.

I need my output like this,

   A           length   new_col
0   648702831   9       64870
1    26533315   8       02653
2         366   3       00366
3   354701058   9       35470
4     5708239   8       05708
5       70554   5       70554
6      1574512  7       00157
7         3975  4       03975

How can I achieve this and also if there is a pandas way that can take less time, that would be great. Any suggestion would be appreciated.

4 Answers 4

3

Use string slicing with zfill. For speed, use a list comprehension.

m = {1: 5, 0: 4, 3: 3}
df['new_col'] = [
    x[:m.get(y % 4, 4)].zfill(5) for x, y in zip(df['A'].astype(str), df['length'])]

df
           A  length new_col
0  648702831       9   64870
1   26533315       8   02653
2        366       3   00366
3  354701058       9   35470
4   25708239       8   02570
5      70554       5   70554
6    1574512       7   00157
7       3975       4   03975

To handle the default case, we can implement a little extra checking when calling zfill:

df = df.append({'A' : 50, 'length': 2}, ignore_index=True)

m = {1: 5, 0: 4, 3: 3}

df['new_col'] = [
    x[:m.get(y % 4, 4)].zfill(5 if y % 4 in m else 0) 
    for x, y in zip(df['A'].astype(str), df['length'])
]

df
           A  length new_col
0  648702831       9   64870
1   26533315       8   02653
2        366       3   00366
3  354701058       9   35470
4   25708239       8   02570
5      70554       5   70554
6    1574512       7   00157
7       3975       4   03975
8         50       2      50   # Default case.
Sign up to request clarification or add additional context in comments.

4 Comments

I am getting a AttributeError: 'DataFrame' object has no attribute 'length' that for my orginal dataframe. For the my test case with smaller data frame this is working. I googled it, and found I can use df[column].length. But no luck yet! Any idea why?
@user9431057 Can you tell me the output of df.columns?
this is what I get: Index(['A', 'length', 'new_col'], dtype='object')
@user9431057 I've edited. What happens if you try to access using df['length']?
3

You can use a list comprehension with a dictionary. This is perfectly acceptable considering Pandas str methods are not vectorised.

d = {5: 5, 9: 5, 8: 4, 4: 4, 3: 3, 7: 3}

zipper = zip(df['A'].astype(str), df['length'])

df['new_col'] = [A[:d[L]].zfill(5) if L in d else A for A, L in zipper]

print(df)

           A  length new_col
0  648702831       9   64870
1   26533315       8   02653
2        366       3   00366
3  354701058       9   35470
4   25708239       8   02570
5      70554       5   70554
6    1574512       7   00157
7       3975       4   03975
8         12       2      12

Comments

3

Fix your code

df['new_col']=''
for i,j in zip(df['length'],df.index):

    df.A = df.A.astype(str)
    if i == 9 or i == 5:
        df.loc[j,'new_col'] =  df.loc[j,'A'][:5]
    elif i == 8 or i == 4:
        df.loc[j, 'new_col'] = "0" + df.loc[j,'A'][:4]

    elif i == 7 or i == 3:
        df.loc[j, 'new_col'] = "00" + df.loc[j,'A'][:3]

    else:
        df.loc[j, 'new_col']= df.loc[j,'A']


df
Out[52]: 
           A  length new_col
0  648702831       9   64870
1   26533315       8   02653
2        366       3   00366
3  354701058       9   35470
4   25708239       8   02570
5      70554       5   70554
6    1574512       7   00157
7       3975       4   03975

2 Comments

thanks for the post. Why did we zip here? Is to make it faster (the way I have?)?
@user9431057 the way you have change the whole value for each time , zip here is using the index adding each value within one loop , rather than change the whole columns, why you get your "wrong" output --- you overwrite whole column each time , so the final output of new-column equal to "0" + df['A'].astype(str).str[:4]
0

You can do it using a lambda function:

df = pd.DataFrame({'A':[298347,9287384, 983, 9283, 894, 1]})
df['new_col'] = df['A'].apply(lambda x: '{0:0>8}'.format(x))

         A      Col1
0   298347  00298347
1  9287384  09287384
2      983  00000983
3     9283  00009283
4      894  00000894
5        1  00000001

1 Comment

thanks for the post, but I need something like I posted above. A need to be able to add one zero or two "00"'s based on the condition.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.