2

I have a pandas data frame like this:

df = pd.DataFrame({'a1':['astr1','jmtr2','astr2','mmsk3',
                         'astr6','jmtr2','astr2','mhhk',
                         'astr5','mmsk','astr6','astr1',
                        'mstr1','mhhk','mstr2','mhhk'],
                   'a2':[x for x in np.random.randn(16)]})
df

    a1      a2
0   astr1   -0.490416
1   jmtr2   0.651627
2   astr2   0.784004
3   mmsk3   -1.595870
4   astr6   1.228631
5   jmtr2   -1.644518
6   astr2   -0.311709
7   mhhk    -1.284221
8   astr5   -0.356339
9   mmsk    -0.071046
10  astr6   1.620838
11  astr1   -0.717384
12  mstr1   0.830618
13  mhhk    -0.020226
14  mstr2   -0.056465
15  mhhk    -0.160234

What I want to do now is merging a1 if the first four letters is the same. Meanwhile, the values of a2 should to be added.

Like this:

    a1     a2
0   astr   $sum of astr$
1   jmtr   $sum of jmtr$
2   mmsk   $sum of mmsk$
3   mhhk   $sum of mhhk$
4   mstr   $sum of mstr$

1 Answer 1

4

I think you need groupby by first 4 characters of a1 with indexing with str and aggregate sum:

print (df.a1.str[:4])
0     astr
1     jmtr
2     astr
3     mmsk
4     astr
5     jmtr
6     astr
7     mhhk
8     astr
9     mmsk
10    astr
11    astr
12    mstr
13    mhhk
14    mstr
15    mhhk
Name: a1, dtype: object

print (df.a2.groupby(df.a1.str[:4]).sum().reset_index())
     a1        a2
0  astr  1.112200
1  jmtr -1.559358
2  mhhk  1.113222
3  mmsk -0.023918
4  mstr -2.526466
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.