Definite numerical integration in a python pandas dataframe

Question

I have a pandas dataframe of variable number of columns. I'd like to numerically integrate each column of the dataframe so that I can evaluate the definite integral from row 0 to row 'n'. I have a function that works on an 1D array, but is there a better way to do this in a pandas dataframe so that I don't have to iterate over columns and cells? I was thinking of some way of using applymap, but I can't see how to make it work.

This is the function that works on a 1D array:

    def findB(x,y):

        y_int = np.zeros(y.size)
        y_int_min = np.zeros(y.size)
        y_int_max = np.zeros(y.size)
        end = y.size-1

       y_int[0]=(y[1]+y[0])/2*(x[1]-x[0])

       for i in range(1,end,1):
            j=i+1
            y_int[i] = (y[j]+y[i])/2*(x[j]-x[i]) + y_int[i-1]

       return y_int

I'd like to replace it with something that calculates multiple columns of a dataframe all at once, something like this:

    B_df = y_df.applymap(integrator)

EDIT:

Starting dataframe dB_df:

        Sample1 1 dB    Sample1 2 dB    Sample1 3 dB    Sample1 4 dB Sample1 5 dB   Sample1 6 dB
    0   2.472389    6.524537    0.306852    -6.209527   -6.531123   -4.901795
    1   6.982619    -0.534953   -7.537024   8.301643    7.744730    7.962163
    2   -8.038405   -8.888681   6.856490    -0.052084   0.018511    -4.117407
    3   0.040788    5.622489    3.522841    -8.170495   -7.707704   -6.313693
    4   8.512173    1.896649    -8.831261   6.889746    6.960343    8.236696
    5   -6.234313   -9.908385   4.934738    1.595130    3.116842    -2.078000
    6   -1.998620   3.818398    5.444592    -7.503763   -8.727408   -8.117782
    7   7.884663    3.818398    -8.046873   6.223019    4.646397    6.667921
    8   -5.332267   -9.163214   1.993285    2.144201    4.646397    0.000627
    9   -2.783008   2.288842    5.836786    -8.013618   -7.825365   -8.470759

Ending dataframe B_df:

        Sample1 1 B Sample1 2 B Sample1 3 B Sample1 4 B Sample1 5 B Sample1 6 B
    0   0.000038    0.000024    -0.000029   0.000008    0.000005    0.000012
    1   0.000034    -0.000014   -0.000032   0.000041    0.000036    0.000028
    2   0.000002    -0.000027   0.000010    0.000008    0.000005    -0.000014
    3   0.000036    0.000003    -0.000011   0.000003    0.000002    -0.000006
    4   0.000045    -0.000029   -0.000027   0.000037    0.000042    0.000018
    5   0.000012    -0.000053   0.000015    0.000014    0.000020    -0.000023
    6   0.000036    -0.000023   0.000004    0.000009    0.000004    -0.000028
    7   0.000046    -0.000044   -0.000020   0.000042    0.000041    -0.000002
    8   0.000013    -0.000071   0.000011    0.000019    0.000028    -0.000036
    9   0.000000    0.000000    0.000000    0.000000    0.000000    0.000000

In the above example,

    (x[j]-x[i]) = 0.000008

Can you give a example of your input Dataframe and your expected output? — Allen Qin
– Allen Qin, Commented May 10, 2017 at 19:04
You are looking for apply probably, but this really won't be any more efficient than a loop over the columns. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented May 10, 2017 at 19:06
Where is x coming from? Is it a Series, a numpy ndarray, or something else? — Mad Physicist
– Mad Physicist, Commented May 10, 2017 at 19:50
x comes from another array, but the ultimately (x[j]-x[i]) is a constant value of 0.000008 for all i and j. @Mad Physicist — capt_proton
– capt_proton, Commented May 10, 2017 at 19:57
What is the type of x? That is much more important than the numerical value. — Mad Physicist
– Mad Physicist, Commented May 10, 2017 at 19:57

Mad Physicist · Accepted Answer · 2017-05-10 20:17:51Z

First of all, you can achieve a similar result using vectorized operations. Each element of the integration is just the mean of the current and next y value scaled by the corresponding difference in x. The final integral is just the cumulative sum of these elements. You can achieve the same result by doing something like

def findB(x, y):
    """
    x : pandas.Series
    y : pandas.DataFrame
    """
    mean_y = (y[:-1] + y.shift(-1)[:-1]) / 2
    delta_x = x.shift(-1)[:-1] - x[:-1]
    scaled_int = mean_y.multiply(delta_x)
    cumulative_int = scaled_int.cumsum(axis='index')
    return cumulative_int.shift(1).fillna(0)

Here DataFrame.shift and Series.shift are used to match the indices of the "next" elements to the current. You have to use DataFrame.multiply rather than the * operator to ensure that the proper axis is used ('index' vs 'column'). Finally, DataFrame.cumsum provides the final integration step. DataFrame.fillna ensures that you have a first row of zeros as you did in the original solution. The advantage of using all the native pandas functions is that you can pass in a dataframe with any number of columns and have it operate on all of them simultaneously.

Jarekczek · Accepted Answer · 2018-10-18 16:56:00Z

0

Do you really look for numeric values of the integral? Maybe you just need a picture? Then it is easier, using pyplot.

import matplotlib.pyplot as plt
# Introduce a column *bin* holding left limits of our bins.
df['bin'] = pd.cut(df['volume2'], 50).apply(lambda bin: bin.left)
# Group by bins and calculate *f*.
g = df[['bin', 'universe']].groupby('bin').sum()
# Plot the function using cumulative=True.
plt.hist(list(g.index), bins=50, weights=list(g['universe']), cumulative=True)
plt.show()

answered Oct 18, 2018 at 16:56

Jarekczek

7,9565 gold badges49 silver badges68 bronze badges

Collectives™ on Stack Overflow

Definite numerical integration in a python pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related