How to get the subset dataframe from the dataframe in python?

Question

I have a dataframe(df):

         id  CI    VaR
0        1  0.600  1000
1        1  0.650  1100
2        1  0.700  1200
3        1  0.750  1300
4        2  0.600  2500
5        2  0.650  2600
6        2  0.700  2700
7        2  0.750  2800
8        3  0.600  1500
9        3  0.650  1600
10       3  0.700  1700
11       3  0.750  1800

i have to create a subset dataframe from this dataframe im doing this;

for col in range(1,4):
    df2 = df1.loc[df1["id"]==col]
    print(df2)

output:

         id  CI    VaR
0        1  0.600  1000
1        1  0.650  1100
2        1  0.700  1200
3        1  0.750  1300

and

4        2  0.600  2500
5        2  0.650  2600
6        2  0.700  2700
7        2  0.750  2800

and

8        3  0.600  1500
9        3  0.650  1600
10       3  0.700  1700
11       3  0.750  1800

this will give me separate dataframe for id 1,2,3 Now i want to VaR value of all dataframe(1 ,2 and 3) and add all the value in the order they are and append to the dataframe of respective. like:

obj = 0
for col in range(1,4):
    df2 = df1.loc[df1["id"]==col]
    obj = obj + df1["VaR"] # error is here
    print(df2)

But this is not working for me

i need ouput like ;

         id  CI    VaR   capital
0        1  0.600  1000  5000
1        1  0.650  1100  5300
2        1  0.700  1200  5600
3        1  0.750  1300  5900

capital value 5000 came from adding 1000 + 2500 + 1500 ( these are all first value of respecti id) capital value 5300 came from adding 1100 + 2600 + 1600 ( these are all second value of respecti id) and so on ... and i need for all the id's like;

4        2  0.600  2500   5000
5        2  0.650  2600   5300
6        2  0.700  2700   5600
7        2  0.750  2800   5900

and

8        3  0.600  1500   5000
9        3  0.650  1600   5300
10       3  0.700  1700   5600
11       3  0.750  1800   5900

Thanks for your time :)

Scott Boston · Accepted Answer · 2021-05-27 14:36:14Z

2

Let's use groupby with transform:

df['capital'] = df.groupby(df.groupby('id').cumcount())['VaR'].transform('sum')

Output:

  id    CI   VaR  capital
0    1  0.60  1000     5000
1    1  0.65  1100     5300
2    1  0.70  1200     5600
3    1  0.75  1300     5900
4    2  0.60  2500     5000
5    2  0.65  2600     5300
6    2  0.70  2700     5600
7    2  0.75  2800     5900
8    3  0.60  1500     5000
9    3  0.65  1600     5300
10   3  0.70  1700     5600
11   3  0.75  1800     5900

Details:

First, groupby 'id' and cumcount to get position in each group
Then, groupby "position" and sum with transform
pandas will handle aligning values using indexes

answered May 27, 2021 at 14:36

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Scott Boston Over a year ago

@MichaelScofield You're welcome. Happy coding! Be safe and stay healthy.

Andrej Kesely · Accepted Answer · 2021-05-27 14:14:46Z

2

I hope I've understood your question right. If you need to repeat the firt, second, third... sums of values in each group:

vals = df.groupby(df.groupby("id").cumcount())["VaR"].sum()
df["capital"] = [*vals] * df["id"].nunique()
print(df)

Prints:

    id    CI   VaR  capital
0    1  0.60  1000     5000
1    1  0.65  1100     5300
2    1  0.70  1200     5600
3    1  0.75  1300     5900
4    2  0.60  2500     5000
5    2  0.65  2600     5300
6    2  0.70  2700     5600
7    2  0.75  2800     5900
8    3  0.60  1500     5000
9    3  0.65  1600     5300
10   3  0.70  1700     5600
11   3  0.75  1800     5900

edited May 27, 2021 at 14:14

answered May 27, 2021 at 13:51

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

1 Comment

Henry Ecker Over a year ago

You could also do df['id'].nunique() instead of len(df["id"].unique())

Henry Ecker · Accepted Answer · 2021-05-27 14:22:22Z

An option via np.tile and a different way to divide the DataFrame via np.arraysplit:

(Assumption: All id groups are equal length, and the total number of groups is equal to the number of rows per group)

from pprint import pprint

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
    'CI': [0.6, 0.65, 0.7, 0.75, 0.6, 0.65, 0.7, 0.75, 0.6, 0.65, 0.7, 0.75],
    'VaR': [1000, 1100, 1200, 1300, 2500, 2600, 2700, 2800, 1500, 1600, 1700,
            1800]
})

unique_count = df['id'].nunique()
df['capital'] = np.tile(
    df.groupby(df.groupby("id").cumcount())["VaR"].sum(),
    unique_count
)

dfs = np.array_split(df, unique_count)

pprint(dfs)

dfs:

[   id    CI   VaR  capital
0   1  0.60  1000     5000
1   1  0.65  1100     5300
2   1  0.70  1200     5600
3   1  0.75  1300     5900,
    id    CI   VaR  capital
4   2  0.60  2500     5000
5   2  0.65  2600     5300
6   2  0.70  2700     5600
7   2  0.75  2800     5900,
     id    CI   VaR  capital
8    3  0.60  1500     5000
9    3  0.65  1600     5300
10   3  0.70  1700     5600
11   3  0.75  1800     5900]

Collectives™ on Stack Overflow

How to get the subset dataframe from the dataframe in python?

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related