2

I don't know where to start but I have data for two stock portfolios that I need to combine to represent one portfolio. Below is the dataframe that I'm starting with and also that I want to end up with.

Here's the data I already have

rawdata = {'portfolio': ['port1', 'port2', 'port1', 'port2'],
        'portfolioname': ['portfolioone', 'portfoliotwo', 'portfolioone', 'portfoliotwo'],
        'date': ['04/12/2020', '04/12/2020', '04/12/2020', '04/12/2020'],
        'code': ['ABC', 'ABC', 'XYZ', 'XYZ'],
        'quantity': [2, 3, 10, 11],
        'price': [1.5, 1.5, 0.2, 0.2],
        'value': [3, 4.5, 2, 2.2],
        'weight': [.6, .67, .4, .328]}

df1 = pd.DataFrame(rawdata)

Here's the data that I want to create

finisheddata = {'portfolio': ['port3', 'port3'],
        'portfolioname': ['portfoliothree', 'portfoliothree'],
        'date': ['04/12/2020', '04/12/2020'],
        'code': ['ABC', 'XYZ'],
        'quantity': [5, 21],
        'price': [1.5, 0.2],
        'value': [7.5, 4.2],
        'weight': [.64, .36]}

df2 = pd.DataFrame(finisheddata)

So what I'm trying to do is to group the two portfolios together by 'code' where the 'portfolio' and 'portfolioname' are arbitary, 'date' is always the same for both portfolios, 'quantity' is a sum, 'price' is taken from either port1 or port2, 'value' is 'price' x 'quantity' and 'weight' is 'value' divided by the sum of the portfolio.

Thanks very very much.

2
  • How do you decide the name for 'port1' and 'port2' after they group? Similarly for portfolioname? Commented Dec 4, 2020 at 2:03
  • I have updated my answer. First is hardcoded value for portfolio and portfolio name columns after aggregation, the second one I have implemented the logic port5 + port6 = port11 and portfolioone + portfoliofive = portfoliosix. Currently works only for single digits and their sum, so beware. Commented Dec 4, 2020 at 2:26

3 Answers 3

1

In order to keep the columns when using agg you can use 'first' as given below:

Code:

import pandas as pd

rawdata = {'portfolio': ['port1', 'port2', 'port1', 'port2'],
        'portfolioname': ['portfolioone', 'portfoliotwo', 'portfolioone', 'portfoliotwo'],
        'date': ['04/12/2020', '04/12/2020', '04/12/2020', '04/12/2020'],
        'code': ['ABC', 'ABC', 'XYZ', 'XYZ'],
        'quantity': [2, 3, 10, 11],
        'price': [1.5, 1.5, 0.2, 0.2],
        'value': [3, 4.5, 2, 2.2],
        'weight': [.6, .67, .4, .328]}

df1 = pd.DataFrame(rawdata)
print(df1, '\n')

finisheddata = {'portfolio': ['port3', 'port3'],
        'portfolioname': ['portfoliothree', 'portfoliothree'],
        'date': ['04/12/2020', '04/12/2020'],
        'code': ['ABC', 'XYZ'],
        'quantity': [5, 21],
        'price': [1.5, 0.2],
        'value': [7.5, 4.2],
        'weight': [.64, .36]}

df2 = pd.DataFrame(finisheddata) # Desired
print(df2, '\n')

df3 = df1.groupby(['code']).agg({'portfolio' : 'first',  'portfolioname' : 'first',  'date' : 'first', 'quantity': 'sum', 'price' : 'first', 'weight': 'mean'}).reset_index()
df3['value'] = df3.price * df3.quantity
df3 = df3[['portfolio', 'portfolioname', 'date', 'code', 'quantity', 'price', 'value', 'weight']]
df3['portfolio'] = df3['portfolioname'] = 'combined'
print(df3)

Output:

  portfolio portfolioname        date code  quantity  price  value  weight
0     port1  portfolioone  04/12/2020  ABC         2    1.5    3.0   0.600
1     port2  portfoliotwo  04/12/2020  ABC         3    1.5    4.5   0.670
2     port1  portfolioone  04/12/2020  XYZ        10    0.2    2.0   0.400
3     port2  portfoliotwo  04/12/2020  XYZ        11    0.2    2.2   0.328

  portfolio   portfolioname        date code  quantity  price  value  weight
0     port3  portfoliothree  04/12/2020  ABC         5    1.5    7.5    0.64
1     port3  portfoliothree  04/12/2020  XYZ        21    0.2    4.2    0.36

  portfolio portfolioname        date code  quantity  price  value  weight
0  combined      combined  04/12/2020  ABC         5    1.5    7.5   0.635
1  combined      combined  04/12/2020  XYZ        21    0.2    4.2   0.364
Sign up to request clarification or add additional context in comments.

Comments

0

This is a touch inelegant but it shows you how to use groupby and then build a series of data. Then once the data is built move it into a dataframe. After most of the output data is assembled then use the output to work out the weight in dataframe.

data = []
for cname, dfsub in df1.groupby('code'):
    port = 'portx'
    portname = 'portnew'
    code = cname
    quant = dfsub.quantity.sum()
    date = dfsub.date.iloc[0]
    price = dfsub.price.iloc[0]
    value = quant * price
    data.append([port,portname,date,code,quant,price,value])
dfout = pd.DataFrame(data, columns=['portfolio', 'portfolioname', 'date', 'code', 'quantity', 'price', 'value'])
sumval = dfout.value.sum()
dfout['weight'] = dfout['value'] / sumval

the output looks like

portfolio   portfolioname   date        code    quantity    price   value   weight
0   portx   portnew         04/12/2020  ABC     5           1.5     7.5    0.641026
1   portx   portnew         04/12/2020  XYZ     21          0.2     4.2    0.358974

If you want to reduce the number of digits in weight then dfout.round({'weight': 3}) to round it to 3 decimal places

Comments

0

You can simply define a dictionary with columns and corresponding aggregations and use agg() with groupby() to get what you need.

g = {'portfolio':lambda x:'portx',
     'portfolioname':lambda x:'portfoliox',
     'date':'first',
     'quantity':'sum',
     'price':'mean',
     'value':'sum',
     'weight':'mean'}

df1.groupby(['code']).agg(g).reset_index()
  code portfolio portfolioname        date  quantity  price  value  weight
0  ABC     portx    portfoliox  04/12/2020         5    1.5    7.5   0.635
1  XYZ     portx    portfoliox  04/12/2020        21    0.2    4.2   0.364

My confusion is with the portx and portfoliox. Right now I have hardcoded those, because you mention they are arbituary. Is there a logic to combining port1, port2 strings that you want to implement during aggregation? Let me know and I can update my answer accordingly.


EDIT: Aggregation over the portx and portfoliox

Since I didn't get a response from OP, here is the code for if you want to generate the portx and portfoliox based on existing values by aggregation -

word2int = {'one': 1, 
             'two': 2, 
             'three': 3, 
             'four': 4, 
             'five': 5, 
             'six': 6, 
             'seven': 7, 
             'eight': 8, 
             'nine': 9, 
             'zero' : 0}

int2word = {v:k for k,v in word2int.items()}

g = {'portfolio':lambda x: 'port'+str(sum([int(i[-1]) for i in x])),
     'portfolioname':lambda x: 'portfolio'+int2word.get(sum([word2int.get(i[9:]) for i in x])),
     'date':'first',
     'quantity':'sum',
     'price':'mean',
     'value':'sum',
     'weight':'mean'}

df1.groupby(['code']).agg(g).reset_index()

  code portfolio   portfolioname        date  quantity  price  value  weight
0  ABC     port3  portfoliothree  04/12/2020         5    1.5    7.5   0.635
1  XYZ     port3  portfoliothree  04/12/2020        21    0.2    4.2   0.364

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.