I am working on an investment app in Django which requires calculating portfolio balances and values over time. The database is currently set up this way:
class Ledger(models.Model):
asset = models.ForeignKey('Asset', ....)
amount = models.FloatField(...)
date = models.DateTimeField(...)
...
class HistoricalPrices(models.Model):
asset = models.ForeignKey('Asset', ....)
price = models.FloatField(...)
date = models.DateTimeField(...)
Users enter transactions in the Ledger, and I update prices through APIs.
To calculate the balance for a day (note multiple Ledger entries for the same asset can happen on the same day):
def balance_date(date):
return Ledger.objects.filter(date__date__lte=date).values('asset').annotate(total_amount=Sum('amount'))
Trying to then get values for every day between the date of the first Ledger entry and today becomes more challenging. Currently I am doing it this way - assuming a start_date and end_date that are datetime.date() and tr_dates a list on unique dates on which transactions did occur (to avoid calculating balances on days where nothing happened) :
import pandas as pd
idx = pd.date_range(start_date, end_date)
main_df = pd.DataFrame(index=tr_dates)
main_df['date_send'] = main_df.index
main_df['balances'] = main_df['date_send'].apply(lambda x: balance_date(x))
main_df = main_df.sort_index()
main_df.index = pd.DatetimeIndex(main_df.index)
main_df = main_df.reindex(idx, method='ffill')
This works but my issue is performance. It takes at least 150-200ms to run this, and then I need to get the prices for each date (all of them, not just transaction dates) and somehow match and multiply by the correct balances, which makes the run time about 800 ms or more.
Given this is a web app the view taking 800ms at minimum to calculate makes it hardly scalable, so I was wondering if anyone had a better way to do this?
EDIT - Simple example of expected input / output
Ledger entries (JSON format) :
[
{
"asset":"asset_1",
"amount": 10,
"date": "2015-01-01"
},
{
"asset":"asset_2",
"amount": 15,
"date": "2017-10-15"
},
{
"asset":"asset_1",
"amount": -5,
"date": "2018-02-09"
},
{
"asset":"asset_1",
"amount": 20,
"date": "2019-10-10"
},
{
"asset":"asset_2",
"amount": 3,
"date": "2019-10-10"
}
]
Sample Price from Historical Prices:
[
{
"date": "2015-01-01",
"asset": "asset_1"
"price": 5,
},
{
"date": "2015-01-01",
"asset": "asset_2"
"price": 15,
},
{
"date": "2015-01-02",
"asset": "asset_1"
"price": 6,
},
{
"date": "2015-01-02",
"asset": "asset_2"
"price": 11,
},
...
{
"date": "2017-10-15",
"asset": "asset_1"
"price": 20
},
{
"date": "2017-10-15",
"asset": "asset_2"
"price": 30
}
{
]
In this case:
tr_dates is ['2015-01-01', '2017-10-15', '2018-02-09', '2019-10-10']
date_range is ['2015-01-01', '2015-01-02', '2015-01-03'.... '2019-12-14, '2019-12-15']
Final output I am after: Balances by date with price by date and total value by date
date asset balance price value
2015-01-01 asset_1 10 5 50
2015-01-01 asset_2 0 10 0
.... balances do not change as there are no new Ledger entries but prices change
2015-01-02 asset_1 10 6 60
2015-01-02 asset_2 0 11 0
.... all dates between 2015-01-02 and 2017-10-15 (no change in balance but change in price)
2017-10-15 asset_1 10 20 200
2017-10-15 asset_2 15 30 450
... dates in between
2018-02-09 asset_1 5 .. etc based on price
2018-02-09 asset_2 15 .. etc based on price
... dates in between
2019-10-10 asset_1 25 .. etc based on price
2019-10-10 asset_2 18 .. etc based on price
... goes until the end of date_range
I have managed to get this working but takes about a second to compute and I ideally need this to be at least 10x faster if possible.
EDIT 2 Following ac2001 method:
ledger = (Ledger
.transaction
.filter(portfolio=p)
.annotate(transaction_date=F('date__date'))
.annotate(transaction_amount=Window(expression=Sum('amount'),
order_by=[F('asset').asc(), F('date').asc()],
partition_by=[F('asset')]))
.values('asset', 'transaction_date', 'transaction_amount'))
df = pd.DataFrame(list(ledger))
df.transaction_date = pd.to_datetime(df.transaction_date).dt.date
df.set_index('transaction_date', inplace=True)
df.sort_index(inplace=True)
df = df.groupby(by=['asset', 'transaction_date']).sum()
yields the following dataframe (with multiindex):
transaction_amount
asset transaction_date
asset_1 2015-01-01 10.0
2018-02-09 5.0
2019-10-10 25.0
asset_2 2017-10-15 15.0
2019-10-10 18.0
These balances are correct (and also yield correct results on more complex data) but now I need to find a way to ffill these results to all dates in between as well as from the last date 2019-10-10 to today 2019-12-15 but not sure how that works given the multi-index.
Final solution
Thanks to @ac2001's code and pointers I have come up with the following:
ledger = (Ledger
.objects
.annotate(transaction_date=F('date__date'))
.annotate(transaction_amount=Window(expression=Sum('amount'),
order_by=[F('asset').asc(), F('date').asc()],
partition_by=[F('asset')]))
.values('asset', 'transaction_date', 'transaction_amount'))
df = pd.DataFrame(list(ledger))
df.transaction_date = pd.to_datetime(df.transaction_date)
df.set_index('transaction_date', inplace=True)
df.sort_index(inplace=True)
df['date_cast'] = pd.to_datetime(df.index).dt.date
df_grouped = df.groupby(by=['asset', 'date_cast']).last()
df_unstacked = df_.unstack(['asset'])
df_unstacked.index = pd.DatetimeIndex(df_unstacked.index)
df_unstacked = df_unstacked.reindex(idx)
df_unstacked = df_unstacked.ffill()
This gives me a matrix of asset by dates. I then get a matrix of prices by dates (from database) and multiply the two matrices.
Thanks