1

I want to transform all rows of a data frame to arrays and use the arrays in a function. The function should create a new column with the results of the function for every row.

def harmonicMean(arr):
    sum = 0;
    for item in arr:
        sum = sum + float(1.0/item);
        print "inside" + str(float(1.0/item));
    print sum;
    return float(len(arr) / sum);

The function actually generates harmonic mean for every row in the data frame. These values should be populated in a new column in the data frame. (the data frame also contains Nan values)

1
  • Can you provide more information? as data sample (can be df.head()), what did you try and what is your desire output Commented Apr 8, 2019 at 16:18

2 Answers 2

3

You can calculate without iterating over the rows:

df['hmean'] = df.notnull().sum(axis=1)/(1/df).sum(axis=1)

   a    b    c     d   e     hmean
0  4  5.0  2.0   5.0  10  4.000000
1  2  8.0  1.0   8.0   6  2.608696
2  7  NaN  1.0   1.0   8  1.763780
3  7  1.0  9.0   4.0   9  3.095823
4  8  5.0  8.0   NaN   3  5.106383
5  3  8.0  6.0  10.0   6  5.607477
6  3  7.0  3.0   9.0   9  4.846154
7  8  NaN  NaN   NaN   6  6.857143
8  2  4.0  1.0   5.0   2  2.040816
9  5  7.0  5.0   3.0   1  2.664975
Sign up to request clarification or add additional context in comments.

2 Comments

Hi! thank you for the answer, I get 1 error which I do not understand. It says: Could not operate 1 with block values float division by zero. Do you know what it means?
@JagruthiC I am not quite sure. It may be a divide by 0 issue, though I can't replicate that issue on my end, as this seems to handle all NaN rows and 0/NaN or #/0 on my end.
0

you can use in built .iloc and .to_list() methods to get the rows as an array and pass them to your method.

rows = df.shape[0]
for i in range(rows):
    row_lst = df.iloc[i].to_list()
    print(harmonicMean(row_lst))

4 Comments

df.values will give numpy ndarray.. this can be iterable along a row.. faster in this way..
@nickthefreak thank you! it worked. However i get this error:ZeroDivisionError: ('float division by zero', 'occurred at index 0') as the rows contains zeroes as well. any idea how do i ignore zero and Nan values while computing the harmonic mean?
I cannot tell if the zero division error is when you divide by item or when you divide by the sum; probably can happen for either one division. You probably need to add an if statement checking that item and sum are greater than 0 before dividing
i did try that, i get this error stating: row_lst = data1.iloc[i].to_list() File "C:\Users\Pinky\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 4376, in getattr return object.__getattribute__(self, name) AttributeError: 'Series' object has no attribute 'to_list'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.