3

I have a large datframe with two columns and a function that takes values from each rows and iterate over the dataframe. Below is the head of the dataframe.

xG_Team1  xG_Team2
0  1.440539  1.380095
1  2.123673  0.946116
2  1.819697  0.921660
3  1.132676  1.375717
4  1.244837  1.269933

x1, x2, x3 are constants.
    x1 = [1,0,0] 
    x2 = [0,1,0] 
    x3 = [0,0,1] 

For index 0, 
y  = np.array([1-(xG_Team1[0] + xG_Team2[0])/k, xG_Team1[0]/k, xG_Team2[0]/k])
i.e.   y  = np.array([1-(1.440539 + 1.380095)/k, 1.440539/k, 1.380095/k])


For index 1, 
        y  = np.array([1-(xG_Team1[1] + xG_Team2[1])/k, xG_Team1[1]/k, xG_Team2[1]/k])

Where k is the total_timeslot and a constant.

total_timeslot = 180 
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal
    def sum_squared_diff(x1, x2, x3, y):
        ssd=[]
        for k in range(total_timeslot):
            if k in Home_Goal:
                ssd.append( sum((x2 - y)**2))
            elif k in Away_Goal:
                ssd.append(sum((x3 - y)**2))
            else:
                ssd.append(sum((x1 - y)**2))
        return ssd

y_0 =  sum_squared_diff(x1, x2, x3, y)

The plan is to sum up the output from sum_squared_diff for all y. Something like, for all i sum(y_i).

So for i = 0,
    y_0 =  sum_squared_diff(x1, x2, x3, y_0)
    len(y_0) = 180
    sum(y_0) = 0.0663099498972334
Then I will have n numbers of sum(y_i) for n xGs.
using @Dillon code, for the above datframe, n=5
sum(results.sum()) = 0.31885730707076826

1 Answer 1

2
data = {'xG_Team1': {0: 1.440539, 1: 2.123673, 2: 1.819697, 3: 1.132676, 4: 1.244837},
 'xG_Team2': {0: 1.380095, 1: 0.946116, 2: 0.92166, 3: 1.375717, 4: 1.269933}}

df = pd.DataFrame(data)

x1 = [1,0,0] 
x2 = [0,1,0] 
x3 = [0,0,1]

# Constants
total_timeslot = 180
k = 180

# Measures
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal

def sum_squared_diff(x1, x2, x3, y):
    ssd = []
    for k in range(total_timeslot):  # k will take multiple values
        if k in Home_Goal:
            ssd.append(sum((x2 - y) ** 2))
        elif k in Away_Goal:
            ssd.append(sum((x3 - y) ** 2))
        else:
            ssd.append(sum((x1 - y) ** 2))
    return ssd

def my_function(row):
    xG_Team1 = row.xG_Team1
    xG_Team2 = row.xG_Team2
    return np.array([1-(xG_Team1 + xG_Team2)/k, xG_Team1/k, xG_Team2/k])

# You can use the apply function
results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)

# Each item in results is a 180 item list
results
Out[]: 
0    [0.0003683886105401867, 0.0003683886105401867,...
1    [0.0004576767592872215, 0.0004576767592872215,...
2    [0.00036036396694006056, 0.0003603639669400605...
3    [0.00029220949467635905, 0.0002922094946763590...
4    [0.00029279065228265494, 0.0002927906522826549...

# For each list, calculate the sum
results.map(lambda x: sum(x))
Out[]: 
0    0.066310
1    0.082382
2    0.064866
3    0.052598
4    0.052702

# Get the sum of all these values
results.map(lambda x: sum(x)).sum()
Out[]: 
0.3188573070707662
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you for taking your time. so i want do_something to be y_i. an array. And for each y_i I'm after the output from sum_squared_diff.
Please take a look at my updated answer and update me on what is left to do. I'm not certain what you mean about 'be able to add up the output'. Are you looking for the answer to be a 180 item list?
Yes! 180 is a timeslot per game and for each timeslot I am trying to measure the sum of squared differences against x1, x2, x3. To do this, first I will add up the values in each list. I.e. sum(sum_squared_diff(x1, x2, x3, my_function(row)) if I need to test the SSD per game. second, sum up the output of 'result' for xG. Then try to optimise xG by multiplying it by some arbitrary constant m. I.e. m multiplies every xG in the data.
@A.Abs Please see the changes. Is this what you want?
@A.Abs I just made another update, which lines up with your updated question
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.